Wednesday 23 January 2013

Comparison of Opta stats providers

It may just be self-selection based on the kind of things I read, but there seems to be an ever growing interest in data in football and the subject appears to be moving away from the niche into the mainstream with increasing mentions in the press such as a recent article in The Guardian.

This is partly due to more and more sites making use of data in football, in particular from Opta.  In this post I'll look at the pros/cons of a number of sites/apps that use Opta data and their comparative strengths and weaknesses.

When Swansea City reached the Premier League with promotion in May 2011, I decided to set up the blog www.wearepremierleague.com to combine my interest in stats with that of the Swans. Generally speaking there is a paucity of (publicly available) data around activity in lower leagues - although credit must go to Ben Mayhew for his attempt to rectify this at Experimental 361. The level of detail publicly available for the top leagues in Europe however is still far beyond that in the Championship and below.

Guardian Chalkboards
When I started out, this was one of the few resources about and had the advantage of being free and web (not app) based.  I won't go in to too much detail about it as its sadly no more (possibly ahead of its time?) but the thing I liked most about it was to be able to visualise the activity with regard to where on the pitch it took place.

The image below shows a Swansea goal against Blackburn where every Swansea player touched the ball in the move.
The addition of squad numbers to activity gives a level of detail not available anywhere else I've looked
Stats Zone
The demise of Guardian Chalkboards a couple of months into the season was the nudge I needed to get an iPod Touch to be able to use the Stats Zone app.

Stats Zone is great for both looking at the top level stats (e.g., Shots per Team) or delving in to the detail of a particular match (e.g., Long passes by a particular player).
Example of a Stats Zone Screen shot, in this case comparing the Aerial Duel activity of Peter Crouch with the Stoke team as a whole
Stats Zone is produced in conjunction with FourFourTwo magazine and their website includes blogs produced by Opta and Zonal Marking and others.

I combined my interest of football with that of data analysis in the creation of a Premier League Review dashboard, which is a interactive presentation taking a number of images from Stats Zone.

WhoScored.com
Whenever Swansea are linked with a particular player (usually from La Liga), WhoScored is the first site I go to as it has in-depth details for any player across the major European leagues:
WhoScored has details both on overall activity for that season as well as the ability to drill down in to activity for a particular game
WhoScored also has a fairly comprehensive list of stats for any particular match with the ability to order ascending/descending on these metrics for each played within a team (Long Balls, Chances Created etc,.) and also blogs from a number of respected writers.

Squawka Sports
Squawka.com is to some extent a cross between Stats Zone and WhoScored in that you can look at activity of individual players across the season as a whole, but also look at specific types of actions graphically for a specific player in a particular match e.g., Canas' passes vs. Malaga
Squawka goes for a dashboard approach for presenting a lot of its data
EPLIndex.com
The level of detail available in the sites/app mentioned above will be enough for the majority of people but for those wanting even more, there is the pay-for site EPLIndex.com (£3.95 a month/£40 a year) which has even more detail.

Where WhoScored for example might have total passes and pass accuracy, EPL Index will break this down even further e.g., Passes/Accurate passes in Own Half/Attacking Half/Final Third:
EPL Index Screenshot - huge amount of data across numerous tabs
The level of detail of this data is pretty much the same as the release from Opta/Manchester City of the summary stats for the 2011/12 season, just not in a single spreadsheet.

One of the other advantages of EPL Index is that it has data for multiple seasons making comparisons such as one I did recently comparing Danny Graham and Kenwyne Jones possible:
Example of the kind of thing its possible collate using data supplied by EPL Index
As well as the option of subscribing to stats, for those who just want to read about stats and football the site has an ever growing number of authors who use the data to write and publish their own analysis to a level of detail which is arguably a depth of analysis rarely seen anywhere.

Relative Strengths and Weaknesses of each source

Stats Zone - Strengths:
  • Ability to visualise activity e.g., location of Shots/Interceptions etc., 
  • Includes simple top level summaries e.g., total tackles made ordered by all players not split by team as is the case in the other sources
  • Ability to drill into data within the game e.g., compare first 62 minutes with last 28
  • Ability to create bespoke comparisons across matches/teams e.g., Chances made by John Walters in first 30 minutes vs. Aston Villa compared to Chances made by Stoke vs. West Brom  
Stats Zone - Weaknesses:
  • Apple devices only - no Android or Web version
  • Lacks ability to see multiple stats simultaneously e.g., Tackles/Passes/Shots per player
  • Doesn't have stats collated across a season

WhoScored - Strengths:
  • Includes data on all major European Leagues and Champions League
  • Easiest site to navigate around between stats for Team/Player/Match
  • Best for comparing statistics across teams, form/shots per game
WhoScored - Weaknesses:
  • Little visualisation of data - there is a nice image of shot areas but not the chalkboards such as those from Stats Zone/Squawka
  • No ability to analyse activity within a game e.g., compare 1st and 2nd half stats

Squawka - Strengths:
  • Has ability to easily track metrics for a team or player for a single match or across season
  • Includes heat maps of activity by player/team
  • Ability to drill down within part of the game (currently 5 minute intervals)
  • Lots of charts as well as raw data, multiple options for visualising the same data
Squawka - Weaknesses:
  • Doesn't have the same level of detail of stats readily available as other sites although only likely to bother the really in-depth user
  • Good to have charts but some could be better e.g., if a player has played in 15 of 22 league games only stats for those 15 shown.  Personally would like to see the blanks to know where over the season that player hasn't featured
  • Stats Zone plots chalkboards from the point of viewing of the team your analysing attacking from left to right, Squawka plots them with Home team playing from left to right which can be annoying when trying to compare areas of attack/passing

EPL Index - Strengths:
  • Most in-depth of any of the data sources
  • Has league data going back to 2008/9 season
  • Top-Stats feature gives ability to find best players across a range of metrics with ability to filter by those playing at least x minutes in a game or total minutes across a season (e.g., avoids problem of someone coming top in pass completion % with 1 pass from 1 attempt)
EPL Index - Weaknesses:
  • Pay-for site
  • No ability to analyse activity within a game
  • Generally best thought of as a source of data from which you create something yourself 

Turning Data into Insight
Although each of these companies is taking the same (or at least similar data) from Opta, it can be seen that they have each used it in different ways and are all still improving as time goes on. Eventually I'd imagine that one of these sites (or a newer entrant such as Sky) will bring all these parts together, possibly also including video for a complete experience.

As an example, a lot has been made recently about David de Gea pushing balls back into dangerous areas when he makes a save, the raw data will only tell you so much but to be able to view all his saves or saves where there is a goal in subsequent 10 seconds would give an even more detailed picture.

TV rights are far to precious to be given away but the ability to create your own highlights package (e.g., All chances created by Pablo Hernandez, with approx 10 seconds of footage per chance created) could take interactive entertainment to a new level.

Other Posts:  Man City and TwitterTwitter and Bookies - A Case Study , Premier League Weekly Review

Monday 21 January 2013

Manchester City and Twitter

Following on from previous posts looking at football and Twitter, this current post looks at some of the activity on Twitter from Manchester City. Man City are current Premier League holders and arguably the richest club in the world so to some extent getting an extra few retweets isn't the most important thing for the club.

That said, with Financial Fair Play being introduced (where clubs have to sort-of be self-sufficient, although as it involves UEFA who knows how it'll actually play out), the club needs to maximise its off-field revenues and Social Media will naturally be part of that.

Man City currently have almost 680k followers and have the fourth highest following in the League behind, Arsenal, Chelsea and Liverpool (Manchester United are currently not on Twitter which is presumably a strategic decision but would likely gain several million followers within a matter of hours if they ever decide to join, such is their global reach).

I've mentioned in a previous blog about 'Sweating the Assets' as an organisation such as Man City will have huge amounts of interesting content but as people are wary about repeating a message for fear of annoying followers quite often the value isn't maximised.

Man City in part have taken this on-board with the example of promoting a video of one of their players John Guidetti who has just started playing again after injury:
Man City tweeting a link to the same article 3 times over the space of a few hours
As seen previously when looking at response to tweets from other accounts, response drops off dramatically within a matter of minutes so there's no reason to worry too much about over-promoting when mentioning the same thing 3 times over the space of 9 hours, there's a drop off in number of retweets but not the levels that you would see if you were repeat mailing the same people.  It also makes sense for an organisation that's looking to promote itself globally to take different time-zones into consideration.

The one thing I would suggest however is adding a dummy part to the URL so the shortlinks produced for each of the tweets are unique to give a true understanding of response per tweet.  There's more details on creating unique links in this blog.

Although the example above shows that they are considering multiple posts, generally speaking, items get posted only once, below are a couple of examples of recent tweets that have sufficient interest to be reposted but which were only posted once:
Recent Man City tweets showing the rapid decay in click-through, the Gallery tweet generating almost as much response in the first 7 minutes as the subsequent hour
Twitter is very much a 'blink and you'll miss it' medium and generally speaking most users won't have set up lists of key accounts such as the football team they support so there is real value in repetition along with considering peak usage times across the globe.