With the recent end of another eventful transfer window the inevitable debate on how well teams have fared this summer is already well underway. One of the main stories from this transfer window was the attempted purchase of John Stones by Chelsea. In the media it has been seen as a big success for Everton in being able to keep one of their most promising young players, but pundits don’t seem to be mentioning how Everton could have benefitted from the money they would have received in the deal. How much different would the Everton team be if they replaced Stones with a cheap defender and used the rest of the funds to purchase a top class attacking player? Would they be noticeably worse or better off?
To try and answer these problems I decided to attempt and distinguish the value of offence and defence and how they differ for different teams.
To do this we can look at the number of goals scored and conceded by teams and how it affects their performance. Now clearly the average number of goals scored and goals conceded by teams is going to be the same, but we can analyse the spread of these goals between different teams and get a better picture of their value. I decided to compare the frequency of goals scored and conceded by Premier league teams. Data is from the past 15 Premier league seasons.
Here we can see that team goals conceded roughly follows a normal distribution around the average of 50.24 Goals (highlighted by the green bar). The median and mode of the distribution of goals conceded are both 51 goals similar to the mean of 50.24, supporting the assumption of the symmetrical normal distribution.
When we compare GA to the GF graph we can see that goals scored are shared much more asymmetrically, with 62% of teams scoring below the average of 50.24 goals. The median for goals scored is 47 and the mode is 45, both fairly different from the mean, suggesting the spread of goals scored is not as even as that of goals conceded.
We can attempt to smooth out these given distributions and see which distributions they can be approximated by.
From previous reasoning we can assume Goals conceded is approximately normal with known mean 52.04 and known standard deviation 12.62.
Using http://optics.eee.nottingham.ac.uk/match/uncertainty.php I plotted the goals scored values to get a recommended distribution. I found that goals for was best approximated by a log-normal distribution with known mean of 52.04 and known standard deviation of 14.74.
We can plot these graphs and compare the two to see the difference between each distribution.
GA represents teams goals against, GF team goals for.
I normalised the data so the mean was 0. Since Goals against is normally distributed, it can be reflected in the y-axis so that we are comparing above average offences to above average defences and vice-versa.
Teams with a below average offence/defence are on the left-hand side and teams with an above average offence/defence on the right-hand side.
If we look at the “top teams” or the right-hand tail, we can see that there are more great offences than there are great defences, whereas the “weaker teams” on the left-hand tail have more poor defences than poor offences.
Now consider you’re in charge of a premier league team and you’re trying to make your team as successful as possible. The main aim is to maximise goals scored and minimise goals conceded, or in other words maximise goal difference, doing so as cheaply as you can. Now obviously maximising goal difference doesn’t always translate to maximising points (e.g. winning 2 games 1-0 gains you more points than winning 5-0 and drawing 0-0) but it is well understood that goal difference works very well in explanatory and predictive terms in relation to success. If we use the cumulative density plot it should be easier to visualise the spread of goals and help us to understand how teams can maximise goal difference.
As you move along the curves from left to right you’re effectively increasing teams goals scored or team goals saved (not conceded) compared to league average. To maximise goal difference whilst using as few resources as possible you basically want to move as far right on the x-axis as you can whilst moving as little as you can up the y-axis (the problem is a little bit more complicated than that but I’ll attempt to tackle that in later posts).
Let’s look at the left-hand tail, and imagine a team that is at the 5th percentile of teams in both offence and defence. Now if they could improve to the 20th percentile in either goals scored or goals conceded which should they choose?
According to the distribution probabilities a fifth percentile team concedes 71 goals and scores 30 goals for a Goal difference of -41. Let’s see how changing our offence and defence affects our performance.
- Improving to the 20th percentile on offence would mean the team now scores 38 goals and achieve a goal difference of -33.
- Improving to the 20th percentile on defence would mean the team now concedes 61 goals for a goal difference of -31, two goals better than the previous option.
Notice how the blue “goals against” curve is shallower than the orange “goals for” curve for this portion of the graph, we are looking for a shallower gradient in order to improve our team the most in terms of goals without improving the most in terms of “quality”, this is why option 2 boosts our team more than option 1.
Coming back to original question, which option would be best for Everton?
During the 14/15 season Everton were 1 goal below average in both offence and defence. If they’re looking to challenge the top teams we should be looking which graph has the shallower gradient on the right hand side of the y-axis. Looking at the graph we can see that the variation of top end offences is much greater than that of top end defences. Based solely on previous league goal distributions Everton should be looking to improve their attack in order to make progress in the league.
Using Premier league data we can see that top-end offences are better than top-end defences, and bottom-end defences are worse than bottom-end offences. We can use this to see that Everton should probably be improving their attack if they want to challenge the top teams.
Everton were strongly linked with both Jonny Evans and Andriy Yarmolenko (Both top 50 players according to Goalimpact http://goo.gl/iyUaJy) with Evans being sold to West Brom for £6m and Yarmolenko rumoured at £15m. Everton could have sold Stones for £35m and would have been comfortably able to afford both players’ fees and wages. The combination of a cheap defensive replacement and investment into their attack would’ve given them the best chance of competing with the top teams, instead of what’s looking like another mid-table finish due to their reluctance to part with John Stones.
Problems to address in future posts
I didn’t want this post to be too long so I thought I’d start with this as a basis and go into further detail in future posts, but yeah, here’s a list of things I’m planning to look into:
- There is an obvious correlation between goals scored and conceded by teams, good teams will score more goals than average and concede less than average. This needs to be taken into consideration when trying to evaluate team improvement.
- The value of attacking players is generally higher than defensive players so this would need to be accounted for when discussing team improvement via transfers.
- How much does a single player improve goals scored/conceded, and how repeatable are these improvements at a player level?
- See how good/bad offences play against good/bad defences and try and relate offensive and defensive proficiencies to points.
If you have any questions please feel free to tweet me @_peteowen