In his review Sky Sports' Adam Bate pretty much hits the nail on the head in that you need to actually apply the data not just collect it for the sake of it. There's no doubt that the use (or at least collection) of data in football is becoming increasingly prevalent but it's the 'So What' factor that's critical.
I liked the story from a few months of Forest Green who are a Conference club ditching Prozone, not least for the use of the word 'malarkey' in the local newspaper headline (I imagine them outside the ground with a 'Down with this sort of thing' placard against the use of modern technology in sport).
Top 2 results for 'Forest Green Prozone' on Google, professional step in Feb '13 but binned by new boss in December after manager Dave Hockaday leaves 'by mutual consent' in October. |
Some of the quotes by the new manager Ady Pennock make him look like quite a
traditionalist:
"I
am a great believer in what I see and my eyes don’t lie, so I don’t need a bit
of paper...The most important stat is the scoreline and I don’t want Prozone
for the sake of having it"
It might seem a bit backward but I'd rather someone had the
courage of their convictions rather than spending money on a product that
gathers dust just because 'it's what the elite clubs do'.
Clubs are at the stage now where they are
facing both 'Big Data' and 'Small Data' problems and it's how they deal with
these that will determine the level of advantage they get over their peers.
Analytics is just another enabler like better training facilities, diet, sleep patterns etc., there's no magic solution but small gains can make big differences to final outcomes.
Analytics is just another enabler like better training facilities, diet, sleep patterns etc., there's no magic solution but small gains can make big differences to final outcomes.
Big Data Problems
As a Data Analyst, I probably hear or read
the phrase 'Big Data' a dozen times a day (almost as many times as I have to
watch a presentation that has a YouTube clip of Moneyball included) and a lot
of the time it's used like its predecessor CRM (Customer Relationship
Management) as a buzzword to try and sell you something you don't really need
or to make something that's relatively mundane sound a bit more interesting.
Ultimately it comes down to the fact that
it costs very little to store information and has becoming increasingly easy to
capture information, so there is the desire to capture as many things as
possible as frequently as possible regardless of if it has any real value.
The most obvious example of 'Big Data' in
football would be Prozone where each player (and the ball I'm assuming) are
tracked 10 times a second (some systems in sports such as NBA track it 25 times
a second) so a 95 minute match after injury time gives 1.3m records per match.
It's not small but nothing compared to what a web company may store.
Even the most data-savvy manager is not
going to want to wade through that much data so it'll be the job of Performance
Analysts (and Prozone themselves) to to try and gain insight from the data,
naturally the first stop will be top level metrics like top speed, distance
run, #sprints etc., but it'll be the ability to go beyond this and be able to
interrogate the data in more detail that'll make the difference and is probably
where the new Forest Green Rovers manager is coming from, if you haven't got
the resource to even scratch the surface of what the data could tell you,
what's the point in having it.
Similarly for training data, you may have
GPS data, heart rate, saliva, sleep diaries but you need to be able to go from
a bunch of data to something that can change what you do with players.
Small Data Problems
Football also suffers from 'Small Data'
problems in terms of small sample sizes both in terms of number of matches and
number of players involved. If one player scores 10 goals in a season and
another 13 which one is the better one? Even if you factor in things like
expected goals (chance of any shot being scored so an effort from 6 yards is
different to one from 40 yards), you're still going to be left with a fair amount
of doubt as to which will perform better next season even if any estimate is
far better than just guesswork.
Where there is doubt, there is the
overwhelming temptation to not even try and be scientific and just go on 'gut
feel' which comes back to the '..my eyes don't lie' comment, even though it's
incredibly difficult to be 100% objective.
One obvious issue is that of confirmation
bias. As a Swansea fan, a good example for me is Dwight Tiendalli but
more well known examples would be Tom Cleverly or Martin Demichelis where they
are expected to be terrible, so every bad pass or missed tackle is seen as
confirmation of this and any good play conveniently ignored. This isn't
to say that people's opinions are necessarily wrong overall, just that in any
given match, the presumption of failure is already present before kick-off.
There's been plenty of talk recently about
over-playing players and injury and I had a look a couple of weeks ago at the link between playing time and hamstring injuries for a
few high profile players, one of the charts looks at Mesut Ozil's playing
minutes and injuries:
Ozil's playing minutes over the previous 7/14/21 days along with injury activity, was overwork after return from injury in Feb responsible for injury in March? |
The problem here is that there is generally too little data
(especially publicly) to have any real knowledge as to cause and effect
(there's always the risk you're looking backwards for possible factors once
someone is injured, ignoring the time they or others exhibit the same activity
but didn't get injured).
You may have a small pool of players who
have started playing again after a relatively minor injury but how many of them
then follow the same playing schedule as Ozil and also a similar playing style
in terms of distance run, sprints etc., and a similar physique. It was
interesting to see some of the doctor's presenting at the conference talking
about pooling data (anonymously) which would improve the situation and this is
taking place at some levels within UEFA.
There's also the issue around
short-termism, it may well be that a particular strategy/approach is the best
over a longer period of time (e.g., limiting a player's match time) but a lot
of the time anything more than a fortnight away might be classified as
'long-term planning'.
If you imagine Wenger deciding whether or
not to start Ozil against Bayern on the Tuesday after playing him for 90
minutes against Everton the previous Saturday: at what level of
likelihood of injury would he decide not to play him? 5%,10%,20% 50%? And when
he does get injured is that particular instance bad luck or bad planning?
Just buying a piece of software won't
solve your problems, but just as surely it's impossible for any one person to
be able to collate, retain, process and analyse all information that may be
useful in creating better performance.
Analytics is taken most seriously at the
elite clubs but I'd argue that the greatest incremental benefit would be for
Championship (or ambitious League 1) clubs, there's enough money there for it
to be worthwhile and also you're more likely to be doing something different to
your peers.
Overall, clubs are faced with the dual issues of having 'too much' data in some instances and 'not enough' in others. This is where the skill of an analyst comes in who can process the data, find the insights and present back in a way that actions are actually taken off the back of the data. They'll also know the difference between what's interesting, what's actually important and what is just noise.
For me a performance analyst and a statistician are needed to work together to combine technical/programming/statistical skills needed to work the data, in tandem with someone who will be more closely involved in the application of any findings (if you can find someone who can do both, fantastic, make sure they never leave).
For me a performance analyst and a statistician are needed to work together to combine technical/programming/statistical skills needed to work the data, in tandem with someone who will be more closely involved in the application of any findings (if you can find someone who can do both, fantastic, make sure they never leave).
I'm naturally biased given my background,
but the obvious solution for me is for clubs to bring in Data Analysts for the
off-field work (Season Ticket Analysis, Club Shop, Social Media Analysis) but
to free up some of their time to look at on-field data. This way you have
someone 'paying their way' even before they get to the football data. I'd
argue they'd probably contribute a greater amount to the bottom line if working
fully on football data but that may be too much of a leap of faith for some
clubs, at least for now.
Other Posts:
Match Predictions: Are you Smarter than Lawro
--
Dan Barnett
Director of Analytics
Analysis Marketing Ltd
LinkedIn: http://www.linkedin.com/in/danjbarnett
Twitter: @analysismktg
Excellent blog Dan. Love the colour you've brought to the problems of small data. Each can be resolvable, but you need to solve the problems on scale to have an impact and most importantly maybe even focus on questions you didn't look at before. Keep up the good work.
ReplyDelete