Evaluating actions in football using machine learning
The Twelve football app evaluates players live during the game. Unlike almost all other ranking that claim to evaluate players, ours is based on a rigorous statistical model of how players increase (and decrease) thier team’s chance of scoring. It is based on The Twelve points system, which is a machine learning algorithm developed by Uppsala professor and soccer-mathematician David Sumpter (that’s me!).
Before you start reading…
Download the App here. It is free and includes a fun game where you can challenge your friends. You can follow live scores and stats, expected goals, and players rankings for all major leagues and the Euros.
How it works
We used data from hundreds of thousands of shots, passes, blocks, interceptions and every other action performed in three seasons of the top five leagues. We then use a statistical model to determine the value of every single match event.
Although the statistical methods are advanced, the unifying concept behind the model is extremely simple and starts with goals. The best thing you can do in football is score a goal. Scoring a goal gives a maximal 1,000 points.
All other points are assigned relative to goals. For attacking play (passes, dribbles, set pieces) Twelve assigns points based on how these actions increase a team’s chances of creating a goal-scoring opportunity. To do this, we first collect all the events over many seasons of professional football into chains of possession. This allows us to compare and evaluate similar passes.
The pass shown above into the opposition penalty area is worth +100 points because 10% of the times that pass is played it leads to a goal. Remember a goal is worth 1000 points, so a pass that gives 10% of a goal is worth 100 points.
Importantly, this evaluation of passes is not based on our ‘intuition’ or subjective feeling about a pass. It is based on a statistical model that accounts for tens of thousands of passes in our data set. This is where machine learning comes in. Our algorithm learns what is a dangerous pass and a less dangerous pass.
Passes back and forward between defenders, which seldom lead to shots, are typically worth only +2 or +3 points. Forward passes in midfield are worth +20 or +30 points.
Defence points are assigned value based on how much they decrease the opponent’s probability of scoring and the probability the player’s team is to score from a counter from the defensive action.
Off-the-ball points are assigned to players who were within proximity of an opposition player at the moment in which he lost the ball and control of possession was turned over to the player’s team.
Not all shots are goals. We assign points to misses and stopped shots based on the probability that a typical shot from that position would result in a goal, this is similar to the, so-called, expected goals model.
Expected goals, which is increasingly used by fans and coaches when talking about football, is thus a limited case, used for dealing with shots, of our more general algorithm. The Twelve algorithm evaluates all actions, so we can compare strikers with defenders in terms of what they contribute.
Points for each of the four categories (Attack, Defence, Off-the-Ball and Shots) are added together to show how a player provides value to their team.
Over the last two years we have provided live rankings for Allsvenskan, Premier League, Champions League, World Cup and Women’s World Cup, which means our scores have been scrutinised by hundreds of thousands of fans.
Overall fans (and experts) opinions usually agree with our scores, but sometimes they are the subject of lively discussions.
We work with clubs in providing rankings that make sense for scouting. Here we use player radars, which don’t count just actions, but count the value-added to the team (in terms of points) by the actions. These are divided in terms of attack:
For scouting at professional clubs it is important that we are able to evaluate the value contributed by players off the ball. We have developed a set of tools for measuring the value of passes, dribbles, pressure, attacking runs. You can find out more about this in the following video.
Please contact us for details: firstname.lastname@example.org
Twelve primarily make use of logistic regression and other supervised machine learning methods to calculate the probability that different actions lead to a goal. Here we outline the method we use for passes.
All the matches used to train the model are broken down into sequences of possession, i.e., fragments of the game during which one of the teams holds the possession of the ball without losing the ball and without any stops in the play (due to fouls, throw-ins, offsides, etc.). A chain was considered broken and another chain begun whenever the opposition team made two consecutive touches of the ball.
Once all actions are allocated to a possession chain then two logistic regressions are fitted in order to assign a value to each pass. The first regression is obtained by assigning each possession chain a value between 0 (if the play ends without a shot) and 1 (if the sequence finishes with a goal).
This gives the probability of a pass (defined by its starting and ending co-ordinates on the pitch) leading to a shot. A second regression is then used to compute the probability of a shot leading to a goal (i.e. to obtain the expected goal value of the shot).
Multiplying these two probabilities for every shot gives the probability that a pass of with certain starting and ending co-ordinates and qualifiers is likely to result in a goal. It is this value which we call the pass impact.
Similar methods are used for the value of dribbles. A reverse process is then used for tackles, headers and so based on how the action reduces the oppositions probability of scoring.
For more information about this approach see the following video.