Oracle Cloud data analysis informs two Premier League awards

Season’s Most Improbable Comeback and Most Powerful Goal winners determined from data crunched from all 380 matches.

Rob Preston | May 21, 2024

When the Premier League’s Bournemouth trailed Luton Town by three goals at the start of the second half, a draw seemed unlikely and a win nearly impossible. After four Bournemouth goals in the span of 33 minutes, fans couldn’t believe what they had just witnessed.

But was it, in fact, the Most Improbable Comeback in the 2023-2024 Premier League season? After crunching the data—1.2 billion rows of it, totaling more than 10 billion data points from all 380 matches—we determined that it absolutely was.

Most Improbable Comeback is one of two end-of-season awards the Premier League announced on May 21, each one based on a rigorous data analysis using Oracle Cloud Infrastructure (OCI) services.

Bournemouth takes home the Most Improbable Comeback trophy, for their 4-3 come-from-behind win on their home pitch on March 13. Equally stunning was the season’s Most Powerful Goal, for which Aston Villa winger Moussa Diaby takes home the trophy for his laser strike against Wolverhampton on March 30.

To arrive at the award winners, the Premier League partnered with Oracle, which deployed a data scientist to analyze the massive amounts of match data using several cutting-edge OCI services. What follows is a behind-the-scenes look at that analysis.

Most Improbable Comeback: How it’s calculated

The Oracle data scientist, Brian Macdonald, arrived at candidates for this Premier League team award using the Win Probability statistic, a third-party stat that calculates the chance of a team securing a win or draw in each match by simulating the remainder of the match 100,000 times.

That statistical model, based on several years of match data generated by Stats Perform, factors in the current score at different times throughout each match, the time remaining in a given match, the number of players on the pitch for each team (to account for any players ejected because of a red card), and whether a team is home or away.

Using OCI Data Science Service, Oracle analyzed the win probabilities for each team in 30-second intervals for each of the season’s 380 matches to calculate which team came back from the lowest win probability to defeat its opponent.

For the Most Improbable Comeback winner, Bournemouth, OCI Data Science determined that Luton had a 97.6% win probability at 49:44 in the second half, the highest percentage of any team during the season that went on to lose the match. At this point Bournemouth had only a 0.4% chance of winning.

Table tracking goals for AFC Bournemouth and Luton Town

Most Improbable Comeback Win % chart between AFC Bournemouth and Luton Town

Most Powerful Goal: Data shows a clear winner

This Premier League award recognizes the player whose goal-scoring shot had the highest average velocity from the time it was struck to the time it crossed the goal line, with the caveats that the strike was from beyond the box’s 18-yard line and was not deflected.

The OCI Data Science analysis revealed that Moussa Diaby’s strike against Wolves on March 30 had an average velocity of 68.25 miles per hour (109.84 kilometers per hour). Only one other goal during the 2023-24 Premier League season was faster than 65 mph (the 65.01-mph strike by Crystal Palace’s Eberechi Eze against Aston Villa on May 19).

The gap between 10th place and 2nd place was only 3.2 mph. “The rest of the top 10 in this category were all kind of close,” Macdonald says. “Each increment was small, and then boom, there’s this big jump for the winner.”

For the fans watching at home, it can be tricky to discern between shots of such power, particularly when some shots skim the pitch surface and others fly into the top corner of the goal. “That’s one reason the data analytics behind these awards are so important,” says Will Brass, the Premier League’s chief commercial officer. “The calculations are complex, involving player and ball tracking as well as detailed analysis of the moment the ball is struck. Oracle Cloud Infrastructure gives us confidence in these precise computations and allows us clarity in declaring a deserved winner.”

As might be expected, all the finalists for Most Powerful Goal were for shots from near the center of goal just outside the box. “It makes sense,” Macdonald says, “because as I look at these shots, a lot of them involve deflected passes coming back to the shooter, away from the goal, which gives the ball extra velocity. It’s just basic physics.”

Table showing which goal-scoring shots had the highest average velocity

Setting up, using the OCI environment

Macdonald says he was able to set up the OCI instances applied to both award evaluations in just 30 minutes.

The first step was to write Bash scripts on OCI Compute virtual machines to pull data from the APIs of the Premier League’s two main data providers and put it into OCI Object Storage. Those scripts pulled updated data after every match day.

One provider is Second Spectrum, which supplies location data on the positioning (3D coordinates) of all 22 players on the pitch, as well as the ball, throughout each Premier League match by using machine learning and computer vision algorithms. The other provider is Stats Perform, whose Opta service enhances the location data to identify match “events,” such as shots (including their location on the pitch, distance from goal, and whether they were left-footed or right-footed), corner kicks, fouls, penalties, and so on.

From there, Macdonald uploaded the data to Oracle Autonomous Data Warehouse, using the cloud-based warehouse’s built-in JSON capabilities to handle the complex, nested JSON structures needed to represent a football match. He then conducted a series of in-depth analyses using the OCI Data Science machine learning platform.

In all, the analysis took in billions of data points from all 380 matches to calculate myriad metrics about each game and goal, ultimately generating a short list of candidates for each award, culminating in the Premier League’s selection of a single winner in each category.

“Connecting to the APIs of the two data providers was probably the most complicated part, because we had to work through the normal first-time authentication steps,” Macdonald says. “As soon as I got those working, it's just running the same commands over and over again. The rest was easy.”

Diagram of architecture used to calculate stats awards
Oracle data scientists used the above architecture to calculate the awards.

The OCI environment has been producing the results for the two end-of-season Premier League for the last three seasons, refreshing the leader boards and dashboards for each award after every match. The preliminary results were used on social media to help promote these special events and goals throughout the season, while keeping the leading candidates secret.

Macdonald explains: “We did a lot of in-depth analytics and discussions of the results, validating and comparing the data, ensuring that we didn’t miss anything.”

Key OCI products used

OCI Data Science Service, the fulcrum of the analyses, is a fully managed and serverless platform for data science teams to build, train, and manage high-quality machine learning models. Automated machine learning capabilities rapidly examine the data and recommend the optimal algorithms, while tuning the model and explaining its results.

OCI Data Science’s drag-and-drop data integration and preparation tools make it easy for users to move data into a data lake or data warehouse. The cloud platform’s security tools and user interfaces enable users with multiple roles to participate in projects and share models. Model-agnostic explanations help data scientists, business analysts, and executives have confidence in the results.

Oracle Autonomous Data Warehouse is a cloud-based data warehouse service that eliminates operational complexities by automating provisioning, configuration, patching, tuning, scaling, and backup.

OCI Compute provides fast, flexible, and affordable compute capacity—from bare metal servers and virtual machines to lightweight containers—to fit any workload. OCI Compute’s uniquely flexible VM and bare metal instances deliver optimal price-performance.

OCI Object Storage enables users to securely store any type of data in its native format. With built-in redundancy, OCI Object Storage is ideal for building modern applications that require scale and flexibility, as it can be used to consolidate multiple data sources for analytics, backup, or archive purposes.

Macdonald also used Oracle Analytics Cloud to present a complete leaderboard for each award, allowing him to re-sort the data based on different criteria—say, to include Most Powerful Goal candidates for shots that occurred within the 18-yard box or narrow the analysis to players on a certain team.

Oracle Analytics Cloud provides a complete set of tools for deriving and sharing data insights. The platform lets analysts visualize any data findings, on any device. It also lets users ingest, profile, and cleanse data using a variety of algorithms, as well as aggregate data and then run ML models at scale.