Friday 4 April 2014

Football: Big Data and Small Data problems

Last week I attended the Sports Analytics Innovation Summit held at the Emirates which had a range of speakers talking about the use of data in a variety of sports and different areas such as performance, psychology and fitness. 

In his review Sky Sports' Adam Bate pretty much hits the nail on the head in that you need to actually apply the data not just collect it for the sake of it. There's no doubt that the use (or at least collection) of data in football is becoming increasingly prevalent but it's the 'So What' factor that's critical. 

I liked the story from a few months of Forest Green who are a Conference club ditching Prozone, not least for the use of the word 'malarkey' in the local newspaper headline (I imagine them outside the ground with a 'Down with this sort of thing' placard against the use of modern technology in sport).
Top 2 results for 'Forest Green Prozone' on Google, professional step in Feb '13 but binned by new boss in December after manager Dave Hockaday leaves 'by mutual consent' in October.
Some of the quotes by the new manager Ady Pennock make him look like quite a traditionalist:

"I am a great believer in what I see and my eyes don’t lie, so I don’t need a bit of paper...The most important stat is the scoreline and I don’t want Prozone for the sake of having it"
It might seem a bit backward but I'd rather someone had the courage of their convictions rather than spending money on a product that gathers dust just because 'it's what the elite clubs do'.

Clubs are at the stage now where they are facing both 'Big Data' and 'Small Data' problems and it's how they deal with these that will determine the level of advantage they get over their peers.  

Analytics is just another enabler like better training facilities, diet, sleep patterns etc., there's no magic solution but small gains can make big differences to final outcomes.

Big Data Problems
As a Data Analyst, I probably hear or read the phrase 'Big Data' a dozen times a day (almost as many times as I have to watch a presentation that has a YouTube clip of Moneyball included) and a lot of the time it's used like its predecessor CRM (Customer Relationship Management) as a buzzword to try and sell you something you don't really need or to make something that's relatively mundane sound a bit more interesting.

Ultimately it comes down to the fact that it costs very little to store information and has becoming increasingly easy to capture information, so there is the desire to capture as many things as possible as frequently as possible regardless of if it has any real value.

The most obvious example of 'Big Data' in football would be Prozone where each player (and the ball I'm assuming) are tracked 10 times a second (some systems in sports such as NBA track it 25 times a second) so a 95 minute match after injury time gives 1.3m records per match.  It's not small but nothing compared to what a web company may store.

Even the most data-savvy manager is not going to want to wade through that much data so it'll be the job of Performance Analysts (and Prozone themselves) to to try and gain insight from the data, naturally the first stop will be top level metrics like top speed, distance run, #sprints etc., but it'll be the ability to go beyond this and be able to interrogate the data in more detail that'll make the difference and is probably where the new Forest Green Rovers manager is coming from, if you haven't got the resource to even scratch the surface of what the data could tell you, what's the point in having it.

Similarly for training data, you may have GPS data, heart rate, saliva, sleep diaries but you need to be able to go from a bunch of data to something that can change what you do with players.

Small Data Problems
Football also suffers from 'Small Data' problems in terms of small sample sizes both in terms of number of matches and number of players involved.  If one player scores 10 goals in a season and another 13 which one is the better one?  Even if you factor in things like expected goals (chance of any shot being scored so an effort from 6 yards is different to one from 40 yards), you're still going to be left with a fair amount of doubt as to which will perform better next season even if any estimate is far better than just guesswork.

Where there is doubt, there is the overwhelming temptation to not even try and be scientific and just go on 'gut feel' which comes back to the ' eyes don't lie' comment, even though it's incredibly difficult to be 100% objective.

One obvious issue is that of confirmation bias.  As a Swansea fan, a good example for me is Dwight Tiendalli but more well known examples would be Tom Cleverly or Martin Demichelis where they are expected to be terrible, so every bad pass or missed tackle is seen as confirmation of this and any good play conveniently ignored.  This isn't to say that people's opinions are necessarily wrong overall, just that in any given match, the presumption of failure is already present before kick-off.

There's been plenty of talk recently about over-playing players and injury and I had a look a couple of weeks ago at the link between playing time and hamstring injuries for a few high profile players, one of the charts looks at Mesut Ozil's playing minutes and injuries:
Ozil's playing minutes over the previous 7/14/21 days along with injury activity, was overwork after return from injury in Feb responsible for injury in March?
The problem here is that there is generally too little data (especially publicly) to have any real knowledge as to cause and effect (there's always the risk you're looking backwards for possible factors once someone is injured, ignoring the time they or others exhibit the same activity but didn't get injured). 

You may have a small pool of players who have started playing again after a relatively minor injury but how many of them then follow the same playing schedule as Ozil and also a similar playing style in terms of distance run, sprints etc., and a similar physique.  It was interesting to see some of the doctor's presenting at the conference talking about pooling data (anonymously) which would improve the situation and this is taking place at some levels within UEFA.

There's also the issue around short-termism, it may well be that a particular strategy/approach is the best over a longer period of time (e.g., limiting a player's match time) but a lot of the time anything more than a fortnight away might be classified as 'long-term planning'.

If you imagine Wenger deciding whether or not to start Ozil against Bayern on the Tuesday after playing him for 90 minutes against Everton the previous Saturday:  at what level of likelihood of injury would he decide not to play him? 5%,10%,20% 50%? And when he does get injured is that particular instance bad luck or bad planning?

Just buying a piece of software won't solve your problems, but just as surely it's impossible for any one person to be able to collate, retain, process and analyse all information that may be useful in creating better performance.

Analytics is taken most seriously at the elite clubs but I'd argue that the greatest incremental benefit would be for Championship (or ambitious League 1) clubs, there's enough money there for it to be worthwhile and also you're more likely to be doing something different to your peers.

Overall, clubs are faced with the dual issues of having 'too much' data in some instances and 'not enough' in others. This is where the skill of an analyst comes in who can process the data, find the insights and present back in a way that actions are actually taken off the back of the data.  They'll also know the difference between what's interesting, what's actually important and what is just noise.

For me a performance analyst and a statistician are needed to work together to combine technical/programming/statistical skills needed to work the data, in tandem with someone who will be more closely involved in the application of any findings (if you can find someone who can do both, fantastic, make sure they never leave).

I'm naturally biased given my background, but the obvious solution for me is for clubs to bring in Data Analysts for the off-field work (Season Ticket Analysis, Club Shop, Social Media Analysis) but to free up some of their time to look at on-field data.  This way you have someone 'paying their way' even before they get to the football data.  I'd argue they'd probably contribute a greater amount to the bottom line if working fully on football data but that may be too much of a leap of faith for some clubs, at least for now.  

Other Posts:
Match Predictions: Are you Smarter than Lawro 

Dan Barnett

Director of Analytics
Analysis Marketing Ltd

Twitter: @analysismktg 

1 comment:

  1. Excellent blog Dan. Love the colour you've brought to the problems of small data. Each can be resolvable, but you need to solve the problems on scale to have an impact and most importantly maybe even focus on questions you didn't look at before. Keep up the good work.