Football’s magical equation?
I am now five articles in to this series and we are starting to make some progress in applying maths to football. I started by showing that there is a lot of randomness and luck in matches and many statistics, such as possession, don’t help us predict the game. Then, in the last two articles, I developed a model, known as expected goals that evaluates teams in terms of the chances they create. Could expected goals be the equation that allows fans to better understand their team, managers to improve their tactics and punters to beat the bookies?
I’m afraid the answer is ‘no’. Expected goals is not football’s magical equation. Football analysis is still has a long way to go. Expected goals are a starting point, but we also need to think about the limitations of this technique.
Lets first re-iterate what expected goals are useful for: expected goals are a measure of whether a team is generating good chances. Teams that make more shots from better positions out-perform those shooting less often from further out. And statistics support this observation: teams with higher expected goals in the past are more likely to win their matches in the future. Expected goals is simply a way of measuring the quality of chances, and every fan, player or manager who wants insight in to how well their team are doing should be aware of their team’s expected goals.
However, expected goals are not the only way to measure chances. It may seem like stating the obvious, but we shouldn’t forget that another very valid way of assessing how well a team is playing is to actually watch their matches!
When Opta collect data on shots their operator makes a note of whether he or she considers a shot to be a Big Chance. What makes a shot or header a Big Chance is difficult to define, but most of us know when we see one. And the Opta operators receive training to make sure they are consistent as possible in how they categorise shots.
These human assessed Big Chances are a pretty good measure of the probability of a goal being scored. In fact, they are just as good as expected goals. Humans are as good as statistics at evaluating the quality of chances a team is creating. The plot below compares Liverpool’s big chances in the first seven weeks of the Premier League, along with their chances that had an expected goal value above 20%. The black circles indicate a goal.
Liverpool have had lots of Big Chances this season. Liverpool have a high number of expected goals. Liverpool are a very good team this season both according to numbers and the people watching them play.
For anyone aiming to turn a profit using an expected goals model, Big Chances are a big problem. Your model is up against the eyes of thousands of fans watching the match. It is the bets made by these punters that set the bookmakers odds, and it is this ‘wise crowd’ that you need to outperform to make money. And it doesn’t look as if expected goals passes this test. People are just as good as statistics at evaluating how well a team has performed.
In my book Soccermatics I put the expected goals model to the test and found that it more or less broke even against the bookmakers odds, but didn’t make a consistent profit.
So where does this leave us in terms of trying to understand football using data and mathematics? Actually, it is here that things get even more interesting. If we want to use maths than we need to go deeper than randomness, possession and shooting. We need to look at where teams are regaining the ball through defending, how players connect to each other through passing and where key passes occur on the pitch. This is what we will do in the coming articles.
Further reading
I was first alerted to the problem with Big Chances and Expected Goals by Jan Mullenberg. He illustrates the point clearly in this article.
Michael Caley gives a balanced account of some of the limitations with expected goals here.
Geek box
To test the relationship between big chance and expected goals, I fitted two models, one based on human observation and the other on statistics, using logistic regression (see geek box in article 3). The first model was the probability of scoring as a function of whether a shot was a big chance. The best fitting model was
P(goal )=0.055 “if shot is not Big Chance or 0.388 “if shot is Big Chance”
This model had Rsquared=0.159.
The second model was a logistic regression on a range of variables not including Big Chance. The best fitting model for the probability of scoring included the following factors: distance to goal line, distance to middle of the pitch, whether attack was a fast break, whether shot came from a corner, whether the effort was strong or weak, whether it arose from a volley or half-volley and if it came from individual play. Despite having many more parameters, this model had an Rsquared=0.167, only slightly larger than the Big Chance model. I therefore conclude that the statistical model does not outperform the model based on human assessment.
This article originally appeared on Nordic bet.