There was a character in the late 90s
sketch show ‘Goodness Gracious Me’ who kept annoying his son by claiming that
everyone of note came from India:
Da Vinci? Indian. The Queen? Indian.
Picasso? Indian.
I have a similar trait to that character
except my ubiquitous reference is ‘Database’:
Google? Database. Facebook? Database. Twitter? Database.
Ultimately all big organisations are doing
the same thing, just in slightly different ways: they all collect huge amounts
of data with the difference being how they pass that back to users with they key being how they store, manipulate and disseminate.
What’s all this got to do with
football? Well, looking at the MCFC
Analytics data I was struck by the similarities between this and the kind of
data you might see within a normal customer database, the data is provided at a
level of one record per player per match which could be considered to be like
items from an order, each order has multiple items and each customer (Team) has
multiple orders.
From here the natural step is to turn a
load of data into summary views which would provide the starting point of any
analysis which in database marketing terms would be:
Single Team View – One record per Team
Single Match View – One record per Match
Single Player View – One record per Player
The insight usually comes not just from
aggregating the raw data but from manipulating it to create extra variables
which give a greater depth of understanding beyond just totals and averages.
The first one of these I have put together
is the single team view, the main part of this is just totalling the details of
the individual players (along with the own goals data) but also adding other details added in around each
team.
This produces a table of nearly 200 hundred
columns, so is fine as a data source but looking at it for any length of time will give you a headache. The job of any analyst should be to be able to take this and make something
more user friendly.
To that end I have produced a summary
dataset called single team view summary.xls which is one record for each of the
teams which as well as having the usual goals scored/conceded also has some
other information which I think is pretty interesting.
Much has been made about Newcastle possibly
punching above their weight (i.e., lucky) and possibly in store for a more
average season this time. It’s certainly
true that there are a number of stats which suggest they over performed:
- Newcastle only had more shots than the opposition in 15 of their 38 games around half of the number of teams around them in the table.
The top 4 (plus Chelsea and Liverpool) had more shots than the opposition in the majority of their matches |
- They conceded 2 ‘Big Chances’ for every one ‘Big Chance’ they had (ratio of 0.67 Big Chances created per Big Chance conceded), Chelsea are the only other top half team where the ratio is less than 1. Where a 'Big Chance' is described as an opportunity where a goal would be expected.
For this metric, the top 4 (plus Everton, Liverpool and Fulham) are the only sides to create more 'Big Chances' than they concede |
- For the majority of their games, Newcastle had fewer passes and fewer final third passes than their opponents where the rest of the top 6 dominated.
The traditional 'Big Six' were the teams that tended to dominate passing (especially final third passes), with Swansea and Stoke being outliers. |
Liverpool were arguably the opposite of Newcastle
in terms of dominating games but not seeing it returned in points but although
luck may play some part in results, the ability to be clinical in front of goal
(Newcastle:11.5% of shots were goals) or not (Liverpool: 7%) is not some random
event but is arguably something a manager may have little control over on the
day itself but does in terms of signings and selection.
Other things of interest were Swansea making
more passes than the opposition in 33 of their 38 games, but only more final third
passes in 9 games with Stoke being the opposite, having just 3 games
where they made more passes but 12 where they made more final third passes.
There are an almost infinite number of ways of reformatting the MCFC Analytics dataset and the output above is only the tip of the iceberg. Given the amount of data involved it may be that collaboration and sharing of datasets is the fastest way to gain an overall understanding of the data.
The spreadsheet behind the figures above (which contains a number of other derived metrics including home/away splits) is available at: https://skydrive.live.com/redir?resid=A1BA00769DC2D906!105 along with the Own Goals data and other Premier League related output.
Dan Barnett
Director of Analytics
Website: http://www.analysismarketing.com/ or
Football Stats Website: http://www.wearepremierleague.com
LinkedIn: http://www.linkedin.com/in/danjbarnett