Richard Whittall has recently written an excellent column for 21st Club in which he explores the gap between analytics
and its functional implementation with football clubs. He
finishes up with this:
The current field in football analytics is very
good at many things, but not so good sometimes in identifying
specific problems for which analysts may provide a partial or whole
solution. Work on the latter will help further bridge the gap between
analyst and club. Sometimes, it’s important for analysts to step
out of R and Tableau and start to breakdown if and how clubs
can actually move the needle on some of these predictive metrics.
Otherwise, they are like doctors who are only able to offer a
diagnosis, but not a cure.
Wise words and certainly advice that should be heeded
if you are one of the many people in the market offering analytic
solutions for football clubs. With such work being necessarily
proprietary and the retention of a competitive edge encouraging
secrecy from within clubs, it is not always obvious how analytically
switched on the industry is. Leading data providers such as
Opta and Prozone occasionally offer a window into the products they
offer to clubs, in particular recent Prozone videos from Hector Ruiz and Paul Power showed great skill and clarity alongside the
benefits and economies of scale afforded by full data access and a
dedicated and skilful workforce. One presumes that by this
point most clubs will have at least a small analytics department and
probably a lot more. Whether such a department is fully
integrated with coaching or the first team will likely vary on a club
to club basis, but the point remains: analytics does not exist in a
bubble, it is in place, there are professional companies that can
offer a full package and access to the market from the outside is difficult. Plenty of people want to work in football and whilst smart in principle, co-opting a few models
from other industries and creating a brand is unlikely to improve on
what is already available. But, and I feel this is important, a desire to work in football is not the only reason people are interested in or learning about analytics.
Indeed, none of this has stopped a vibrant online amateur
community from sprouting up in recent years. The advent of data
sites such as Whoscored and Squawka has offered easy access to data
at a level that far exceeds what was available prior. Now,
anyone with curiosity can collate data from numerous competitions at
a player and team level and play with it. It can be analysed
and truths, both whole or partial, can be uncovered. These
truths have a variable application. For some, with good
technical skills, predictive modelling can inform betting, for others
fantasy football. I choose to tell stories about what I've
deduced and i've sunk many hundreds of hours into it because I find
it interesting and intellectually rewarding. As with any
subject, there is a learning curve that never ends, not everything I
do hits the mark and there are few short cuts to knowledge, but as
others who've done this before me have noted, you do it because it's
fun.
The current situation in analytics has created
different viewpoints. Firstly there is a great drive for
predictability, repeatability and application. These are
entirely logical and commendable aims, but the arms race to maximise
these effects has lead to a shroud being laid over the details
involved. In particular, and with one notable exception from Michael Caley, the multiple black box Expected Goals models and
advanced derivatives regularly cited have obfuscated analysis due to
their non-standardisation. This is not criticism of any
specific model, many hours of hard work and theoretical analysis will
have gone into each by people with far more advanced skills than
myself and those with such access have doubtless found multiple
utilities for insight gained. My concerns lie around
accessibility and interpretation and this is where I feel some parts
of the analytics community have missed the point.
Barriers to entry may have reduced over time, but
barriers to understanding have not. There is no such thing as
an "Expected Goal", it is entirely theoretical. A
layman interested in football statistics may not yet understand the
value of a shot or a shot on target yet he is quickly encountering
hypothetical versions of the things he does understand: goals.
That is a tough sell. Shot counts are real and easily
understandable, they aren't "outdated" metrics, they are
the building blocks of all that comes after and if the analytics
community has any interest in popularising it's method of thinking
and transcending a niche corner of football, the stories told by our
fundamental metrics are intrinsic.
And there are many stories to be told. Variance
in league seasons of 34 to 46 games is huge. Half and whole
seasons can go by where the measurable statistical reality of a team
is skewed vastly in either direction. Liverpool's huge
overachievement of 2013-14 followed by an almost inevitable
regression this season, just one obvious case. It isn't just the
board that need to understand the wider implication of such matters,
the fans can benefit too. Interpretations may differ but we can
pull apart possible reasons in the numbers and disseminate the
knowledge. Oh for a day where the average pre-game conversation
involves an understanding that a team's save percentage has been
unsustainably high or a striker gets cheers rather than murmurs or
abuse because fans understand he's been unfortunate rather than
inept.
It's probably a long way off but as each year passes,
we collect more data and we can test more outcomes. Our
knowledge can grow and with it our expertise and ability to inform.
It is important to encourage people new to the movement and support
their effort. We may strive for professionalism but we all start as amateurs. Guide rather than chastise and realise that the
more people that are interested in football statistics and analytics,
the greater the likelihood of resulting success for everyone,
whatever your desired end-game.
This may seem somewhat utopian, but elitism will get
us nowhere, accessibility and inclusiveness just might.