In a new season, when do statistics settle down?
With the Women’s Champions League final on Sunday, there are parts of the 2019/20 season still to be wrapped up, and yet 2020/21 is already upon us. New dawns, new faces, new outfits; our levels of knowledge drop again on the rollercoaster of sporting life.
But a new season also tends to mean that we start from scratch on the data front as well. Given that we re-set the sample size to zero each year, the question then becomes: how long does it take to ‘settle’ again?
It won’t surprise you to learn that the answer is not ‘one match’. There’s quite a lot of work around expected goals, and this post by David Sumpter suggests that things aren’t worth looking at until three games in. There’s still a bit of ‘noise’ there — when stats can jump around and patterns haven’t settled in yet — until around seven matches, by which time the level of noise drops a lot.
Unsurprisingly, the end of the technical details on this 2015 post from Michael Caley shows a similar thing with a range of other shot-related stats.
It’s worth noting that these are both based on teams’ numbers, and with players a similar level of caution should be taken. Without doing the deep kinds of investigation as Sumpter and Caley, we might be able to do some back-of-the-envelope maths for player-level shot stats.
Teams average around 10-11 shots per game, three games is 30-33 shots; strikers tend to average 2.5-3.5 shots per 90 minutes; so it may take 10 matches for their shot stats to settle down past a seismographic start.
This isn’t to say that you shouldn’t look at these statistics before this point, just that you should be wary that they might not be smoothed out yet. In recent years, forwards like Danny Ings, Glenn Murray, and Romelu Lukaku (to name just a few off the top of my head) have had big matches early on in the campaign which skewed their averages for a few months. After that time, they’d had enough ‘normal’ games for the averages to look more normal.
To take a team example again, a passing network for a single match is unlikely to be perfectly similar to their general passing network, but it’ll usually have a lot of similar features.
If we take a look at Liverpool, their passing networks for the first three matches of the season were broadly similar to how they would remain for weeks to come. The network from the opening match alone is somewhat similar.
Several features of the general team shape are there in the opening match’s passing network, although the defensive midfielder is far deeper. The concentration of key passing combinations on the left-hand side of the team also gets spread out a little over a few extra games.
For players, things will naturally settle down quicker in statistics where there are more events. A collection of defensive actions (using Wyscout data) seems to settle down after around four matches, where it becomes a distribution that doesn’t change much over the course of more games.
These will, of course, depend on the type of statistic and type of player. For heatmaps — which are just a way of visualising a player’s touches — I found that a player like Jordan Henderson’s settled very quickly, but a striker like Teemu Pukki saw his change quite a lot from game-to-game until around eight matches into the 2019/20 season.
From here, it seems like it’d be possible to get rough benchmarks for all kinds of stats. If teams switch formations then it’ll naturally take a larger number of games to get a feel for how they play, as one will be waiting for three or four matches in each type of line-up.
The same will be true for players and positions, or roles, too. Naturally, a forward’s shooting statistics are likely to look different if they’re playing as the central striker or as a wide forward. Even Pierre-Emerick Aubameyang, who probably has the most similar shot maps when he splits positions like this, has a bit of the tell-tale ‘winger’s tail’ shape to the shot map when he’s played wide on the left.
When we start watching a new season, we’re all used to exercising a bit of caution in the first few weeks anyway. The same’s true of the data, but not always to the same extent.
All the graphics and visualisations in this article use Wyscout data and were produced in the Twenty3 Content Toolbox.
If you’d like to learn more about our products or services, and how they might be able to help you, don’t hesitate to get in touch.