Wednesday, 1 March 2017

A few thoughts on moving beyond xG and stats in the media

I contributed to an article on Ultimo Uomo recently about how can analytics move beyond xG and stats use in the media.

That article was in Italian, and had plenty more contributors, so check it out, but here are my thoughts pre-translation, for anyone interested.

How can analytics evolve beyond xG?

Expected goals provided a strange line in the sand for football analytics primarily because the data required to build a reputable expected goals model fell outside the published statistics from public sites and the technical aptitude to build such a model was specific and not trivial.
So for a long time, people built xG models, or looked on from outside wondering about xG models. This had the effect of slightly focusing the evolution of football analytics around xG related topics—people built ever more complex models, and eventually graduated towards models that valued the movement of the ball wherever it was on the pitch, but the basic concept was still an expected goal value.

More recently studies of passing have become more prevalent a with a desire to identify players and teams that are most efficient or successful at moving the ball. Still though it is difficult to separate descriptions of style from actual beneficial results that tally with winning football matches--a long term core issue with any metric development.

There is a lot of hope for tracking data, that it might add in extra factors that aid precision but nothing is public there yet and it's possible that benefits from that will only be marginal, a charge that could also be laid at the addition of running and sprinting data.
Defensive analysis remains hard to work with at a player level and it must be hoped that advances in quality of data can shed light here.

Stats in the media
The level of stats use in the media has seen a sharp rise in recent years, and with fantasy football, Football Manager and data sites such as Squawka and Whoscored, the acceptance of numerical descriptions of players is firmly entrenched in younger fans' minds. More often we see talk of shot or shot creation numbers for teams and players which add a necessary second layer for analysis beyond goals and assists (xG may still be too esoteric for total mainstream usage) and this is positive.
Less positive is the use of other statistics that do not represent what they are being used for. Defensive stats like tackles and interceptions are often presented in a more = better fashion, when they are little more than descriptive, they do not necessarily reflect quality of play. Goalkeepers cannot be graded accurately by volume of saves and simple lists based on one or two stats do not do a good job of grading players outside of attacking metrics.

So we have more presentation of stats, more description of stats, but a long way to go to before actual nuanced and thoughtful analysis of stats is anything like normal. And there is certainly a knowledge gap here. It takes time and understanding to read genuine meaning into football statistics yet there is a requirement for media companies to incorporate information to their presentations, and maybe only in certain cases is this backed up by genuine understanding. This same problem presents itself inside clubs, where performance analysts are bringing statistics into their work without sufficient grounding in what matters and what does not. Until both the media and football understand that they require knowledgeable people to direct their usage of stats, offerings will fall short of their potential and we are in danger of finding stat use marginalised (in clubs) or used as trivia but nothing more (in media).

We see more visuals in the media now, but again I would caution against their usage without understanding. An average position map, average pass location or heat map, is rarely capable of giving the full truth yet remains popular and often misused. However shot location maps, with or without expected goal values, or specific pass or chance creation maps can reveal significant truths about a game, a team or a player. To say “Look this player always shoots from 30 yards and never scores” and visualise that is a simple method of showing and proving a point. The key points should always be that any number used or visualisation shown adds to the presentation, reveals truth otherwise concealed and is quick and accessible to understand. There is certainly work to do on all sides here and only teamwork between visual experts/storytellers and those who understand the data can make this work best.