Premier League taps Oracle Cloud to consolidate UK football match data

Oracle Autonomous Data Warehouse helps leagues, teams, media, and other users get more creative with their in-game and post-match analytics.

Jeff Erickson | July 5, 2024

When a football (soccer, to you fans in America) team goes on offense, players don’t wait for permission to move. With the ball in their possession, creativity takes over and possibilities abound, but the moments are fleeting.

Likewise, a cross-section of off-pitch football interests in the UK—including media outlets, betting companies, individual team and league organizations, and their business partners—want the freedom to move fast in exploring possibilities hidden in matchday and other forms of data. Until recently, that data, generated by the Premier League and other leagues and competitions, was often stored in the systems of various sports data vendors, adding intermediaries and additional steps that made creative data explorations difficult.

Now, backed by Oracle data experts and cloud infrastructure, a company called Football DataCo (FDC), which is jointly owned by the Premier League and English Football League, is consolidating all that match data—32 years of it, covering 27 different UK leagues and knockout competitions, plus ongoing matchday feeds—into one Oracle Autonomous Data Warehouse. The arrangement gives the Premier League a new level of control over the impressive stock of data it owns.

As the world’s most popular football league, the Premier League alone generates terabytes of game-day data captured 25 times per second by cameras tracking every player, run, pass, shot, save, tackle, and other “events” that happens on the pitch. That data is made available upon request to the media outlets, league analysts, and other users cited above.

“Now we can keep that data up to date as games are played, autonomously, and allow the user to go in and query it through the Premier League’s own front end,” says Mark Bowden, FDC’s product and relationship manager. Bowden foresees analysts tied to the Premier League, other UK competitions, and their partners getting creative with the accumulated data using nearly any analytics tool they choose. The possibilities will only grow as Oracle Autonomous Data Warehouse learns to let people interact with the data through generative AI large language models (LLMs). “GenAI is a real game changer for the way we will be able to access the data,” he says.

By interacting with GenAI rather than SQL programmers, Bowden says, editorial and creative types can bring their own approach to storytelling with the data, “beyond what a data specialist might dream up,” he says. “I would love to see that.”

Users could query the data warehouse with simple questions about players’ current performances, such as how far select midfielders have run during a match and how many touches they’ve had in the opponent’s half. Or users could ask fun historical questions, such as how many times has a goalkeeper scored the winning goal in a Premier League game. Users will also be able to query the data warehouse with complex tactical questions: Is the trailing team playing too far forward against this opponent? How have other teams fared with this tactic? Has it led to many goals from counterattacks?

Trove of data

The Premier League alone has collected data on 73,000 different matches from 250 different teams in 345 different stadiums, says Simon Wigley, an analytics director with Oracle Technology Consulting, who’s working with FDC. “For each of those games, we know the lineups and the positions of each player, as well as who was subbed in,” Wigley says. That’s data on about 20,000 players and 130,000 goals, as well as stats on the managers and referees. And even though VAR (video-assisted referee) reviews are relatively new to the Premier League and other competitions, there’s data on 1,200 of those decisions, he notes.

Yet all this historical data is small change compared with the matchday riches created by modern AI-based systems, Wigley says.

Now we can keep that data up to date as games are played, autonomously, and allow the user to go in and query it through the Premier League’s own front end.”

Mark Bowden Product and Relationship Manager, Football DataCo

Take the Premier League. Not only do its partners collect data on every pass, shot, run, tackle, corner, etc.—39 million of those events are now in the data warehouse—but each of those events also contains a number of attributes. “When there’s a pass, the system will note its speed, who made it, and who received it,” Wigley says. “A corner kick will note the direction and who took it.” The list goes on. In all, 180 million of those attributes are in the consolidated data set, he says.

“That’s raw materials for someone like me to answer any question,” says Brian Macdonald, an Oracle data science cloud architect who specializes in sports analytics. “When I'm watching a match, I could see something and say, ‘Hey, I don’t think I’ve ever seen that before.’ I can then do some analysis that asks, ‘Has it ever happened before?’ And if it has, well, how often does it happen? One question leads to another question very quickly.”

MacDonald says he will often go to Oracle Analytics Platform connected to an Autonomous Data Warehouse and apply filters and start visualizing his analysis by creating charts and tables. “I might want to build some kind of predictive model, such as win probability of an ongoing game, based on simulations using historical data,” he says.

Life in the fast lane

The way the data collection works, each week the Oracle platform transfers content from local data collectors amounting to 94,000 different payloads into the data warehouse. Timing matters: There are hundreds of matches occurring throughout the week, with lower leagues collecting data at different levels of detail. Plus, with English football’s knockout tournaments, schedules shift constantly. “The system has to know not only what data to ask for but when to ask for it,” Wigley says. “A lot of work went into making sure our code and our logic covered it all.”

The system ingests data in different ways for different uses. Some of those payloads, including lineups, game attendance, and other standard match data go into data storage alongside player tracking data where analysts can aggregate it and use it to generate post-match summaries and feed deeper analysis and predictions.

The next step of the project, currently a proof of concept, is to simultaneously ingest ongoing match data through what Wigley calls the “fast lane.” This data is made available to analysts in real time. “When something happens in a Premier League match, users of the data warehouse will be able to immediately bring it into their analysis,” he says.

Now the Premier League and other users have access to all this match and historical data to use as they see fit, Wigley says. For example, the Premier League could pull relevant data right from the data warehouse, apply GenAI to it, and create personalized match summaries for fans in their own languages based on parameters—such as a team, player, or position on the pitch—they have indicated interest in.

Says FDC’s Bowden, “It’s a real shift for us to feel like we have the control and empowerment to use a vast array of disparate data sources. And the exciting thing about it is, we don't know exactly where it’s going to go.”