Analytics Suck Sometimes: A Weekend At The MIT Sloan Sports Analytics Conference
I was on the edge of my seat, attentively listening to the “Automatic Playbook Generation in Football” presentation. There were arrows and heat maps and crazy technology and all I could think was, “no more tedious play-charting… this guy’s gonna be rich!”
Then, “our technology is 80% accurate.” Oh. “Can it tell, say, the difference between a designed screen and quick slants?” “No.” And it can’t distinguish between anything besides run and pass, and scrambles are labeled as passes. Oh.
That was the theme of the MIT Sports Sloan Analytics Conference. Cool! Cool! COOL! Oh, you’re in the beginning stages. For a normal fan, this is useless right now.
As virtually every conference recap aptly notes, the big names — Daryl Morey, Bill James, Brad Stevens, John Hollinger and scores of their cohorts in the audience — don’t actually tell you anything, because they have no incentive to do so. They shut up, have fun and listen to the young guys with the beginnings of promising research. Then they hire them and hide them and their data and equip them with team-brand muzzles, never to be heard from again. Some of the guys presenting already have non-disclosure agreements and are hiding stuff, I assume. Maybe all.
I left impressed and smiling, but not especially enriched. The conference is intended “for industry professionals (executives and leading researchers) and students to discuss the increasing role of analytics in the global sports industry.” When it comes to actual insights that you, the reader, should care about… well, there wasn’t much.
The average, or even hyper-interested sports fan leaves Sloan, thinking, that was cool, but kinda overwhelming.
What the hell should I actually care about when it comes to “analytics?”
If you’re a serious sports fan, you care about a few things. These are the areas in which analytics are worth paying attention to. You want to be able to look at numbers and figure out, or be told by writers or TV “analysts” or somebody:
1) Why does my team suck? How can they stop sucking? Or, why are they good?
2) What are their chances of winning tonight, tomorrow, this year and the next decade?
3) Was the transaction my team just made good? Or did we get ripped off?
4) Which players suck? Which players are good? What can they do to improve?
5) If I extract the media bullshit, what’s the true narrative about my team?
6) Did my coach just fuck up?
Something like that.
There are plenty of “advanced stats” that accomplish these things. Smart fans already know to eschew things like: pitcher’s records, RBI and batting average in MLB for, say, WHIP, OBP and OPS. WAR is very good, albeit imperfect, and like that article says, “If you ask the average BBWAA writer to break out the chalk and explain every aspect considered in WAR (rWAR or bWAR), not a single one of them could write a single equation and include all variables. Not one.”
But you don’t necessarily need to be able to explain how the sausage is made. You just need to know if it’s good sausage and be able to succinctly explain what the sausage tastes like in your mouth, all sexual innuendos intended.
The good stats are easily explained and good and you should use them. For example: for NFL, DVOA is a solid way to measure team’s performance by adjusting for strength of schedule and situational context. KenPom ratings measure predictive NCAAB team quality through pace-adjusted efficiency ratings. PER and Win Shares do a decent job at evaluating (mainly offensive) individual play in the NBA. Score-adjusted Fenwick approximates NHL teams’ puck possession skills, which correlate strongly to future wins.
The guys that tout “analytics” are indeed the smart ones. But there are plenty of smart people who are actually Dumb Smart People, who are aware of “analytics” and “advanced stats,” but aren’t researching them themselves. They simply eat up what’s out there already and use it to answer every single question with extreme confidence. You will see these people on Twitter, especially, where they act as if they know how good every player and team and stat is, but they cannot predict outcomes any better than you or I. If a stat seems weird, they don’t try to explain why, they just call it “variance” or an “outlier.” Most of the time, outliers and variance can be explained in real sports terms, if you know what the hell you’re talking about.
Sometimes, numbers lie. Or they don’t lie, but you put words in their mouth. They’re not always saying as much as you’re hearing, Mr. Biased Numbers Man.
As my friend Rick Ipedia informed me, “analytics is the discovery and communication of meaningful patterns in data.” If it’s not meaningful, it’s not “analytics.” It’s just fancy noise.
“We’re not talking analytics. We’re talking mathematics. That’s the thing that frustrates me at this conference.” – Brian Burke.
This is an important comment, even though it was mostly bullshit in context. As Seneca said, “I shall never be ashamed of citing an ignorant hockey mind if the line is good.”
Brian Burke was the stooge of the conference, the most “traditional” executive in the greenest major sport when it comes to analytics. A hockey guy who basically says he drafts people based on how they treat their family.
The audience, justifiably, laughed at most of his words. “It’s an eyeballs business!” We get it, man: You’re old. But whenever “advanced stats” are brought up, plenty of smart guys ignore the legitimate concerns. Burke is no genius (“The reason I’d never ‘do Moneyball’ is: It’s boring. I have to entertain people.”), but there’s a reason that people — smart people — are skeptical of (some) analytics (at times). Sometimes, just sometimes, they haven’t materialized as useful yet. Of course they’re worth pursuing, but that doesn’t mean that every bit of research has a useful application.
One of the coolest, most useful papers was “The Three Dimensions of Rebounding.” It broke down “rebounding” into its own equation to more-accurately measure skill.
Rebounding = (positioning + hustle) x conversion.
It’s a great approach, it was well done, and the same model can be applied to other stats.
But what did the average fan learn? Basically, that the guys you thought were good, are good — that they all do it differently (some more hustle, some conversion, some positioning) — and that Al-Farouq Aminu is really good at rebounding. And when the dude told that to personnel guys, they patted him on the back, saying, “You’re right! We’ve known that for a while through film.”
So, look at their paper. You might decide to make a rebounding argument using them. But it’s not exactly changing the game. You’re probably not going to apply their methods to hockey saves = shot trajectory + speed or tackles = positioning + hustle + whatever. For the average person, like myself, it’s really loud and esoteric and boring and just call me when you actually figure something out.
I’m just gonna get drunk and yell at Rex Ryan for not using his timeouts correctly. That stuff is “advanced.” That stuff we know.
I don’t mean to insult the rebounding paper. That was an exception, arguably the star of the conference. It’s something that I won’t ignore, you shouldn’t ignore, and maybe even Brian Burke won’t ignore.
(Burke should really pay closer attention, though, because hockey lags far behind in analytics, edges are likely attainable, and right now, he manages what he described as a “horseshit hockey team.” He might want to start with probably the other star paper of Sloan, which quantified how much more important it is to carry the puck into the offensive zone rather than dumping it.)
Really, though: It was mostly noise. I propose renaming it the Sloan Noise Collectors Gathering. The crowd desperately searches for Signals (a la Nate Silver’s book), but instead, they wind up filing noise complaints! (Joke NOT stolen from Rick Reilly. He does not attend gatherings of forward-thinkers.)
The most-hyped research paper (yes, Nerd Alert) was on EPV, or “Expected Possession Value,” a “real-time stock ticker” for NBA decision-making. “Predicting points and valuing decisions in real time.”
It’s an incredible paper. It’s also currently useless for fans.
Its creators, Harvard professor and Grantland writer Kirk Goldsberry, with Harvard PhD students Dan Cervone, Alexander D’Amour and Luke Bornn acknowledge that its simply the fascinating beginning to a maybe-revolutionary group of statistics. A huge step? Yes! But to what? To, first of all, refinement (the current iteration is flawed), and then, way down the road, as a tool to create new metrics, which then will have to prove their usefulness, etc. etc. So, it’s awesome. But for most fans? Useless, at the moment.
Stats are noise until they pass through all of the filters (research, public availability, verification of accuracy and usefulness), and, finally explanation in real words that make sense. Like, “See, Barry Baseball Fan, we use OBP instead of batting average, because, similar to what your Little League coach taught you, ‘A walk is (almost) as good as a hit!'”
Lots of people can’t explain their “advanced stats” in regular sports terms. It’s not because the audience is too dumb. It’s because the stat isn’t useful for fans (or the person is really awkward).
As a gambler, though, I have to mention the best number ever, the one that will never become obsolete. A top-level stat (well, sort of a stat) I look at every day. I will be brief in touting it’s importance, but it’s a shame it’s not ubiquitous:
The point spread is the Undisputed Heavyweight Champion of “Advanced Stats.”
The point spread (and to a lesser extent, futures odds on things like Super Bowl Champions or team season wins over/unders) is the best measure of team talent and individual game predictability.
The saying, “Vegas knows!” is ridiculous for multiple reasons (most influential oddsmakers work offshore, not in “Vegas,” and it’s really the market that “knows;” the oddsmakers just have to pay attention and be sorta close), but it has the right idea. No sports market is truly “efficient;” teams cover by a lot of points all the time (often due to randomness, sometimes due to factors unaccounted for by the market), but the truth is: The market knows (almost) everything.
The sports betting market is (pretty) efficient. Especially when you compare it to the other “options,” which are, like, the shit people are saying on ESPN. Wanna know how good a team is? Watch games, read smart writers, use the good stats out there, and pay attention to the sports betting marketplace. Making money betting sports is incredibly hard, but using betting markets to analyze sports is fairly easy.
Letdown spots? Injuries? Any psychological “edge” for a team? Typically understood, at least pretty well, by the market. An unlucky team? Same. A team improves their nutrition? That improves their performance, and it’s soon accounted for in the line. Sure, you can have an edge in sports betting, but unless you’re doing your own independent research, you probably don’t have an edge. The point spread is the best “analytics” tool there is, and even as things get more advanced, the market will capture all of the big stuff, and the point spread will remain the Holy Grail. Forever.
Hilariously enough, though, the betting panel was the worst panel of them all, and it wasn’t close. Betting markets are at the top of the Sports Information Spectrum. Coverage of sports betting markets? IN HELL. But that’s a whole different topic.
It seems that we’re reached a tipping point in “analytics.” As Kirk Goldsberry wrote, “for years, we have talked about ‘advanced stats’ when what we were really talking about was slightly savvier arithmetic.” Smart people already know that points per possession are better than points, OBP > batting average, etc. etc. That’s done, besides convincing the antiques.
But this next wave of “advanced stats,” EPV and all of the behind-the-scenes stuff that nobody wanted to share at Sloan… well, it seems far off. Most of the easily-fixed flaws in sports analysis seem to be out there. Or maybe the next generation has arrived, but it’s hidden. Which means it’s not being shared anytime soon..
As Nassim Nicholas Taleb writes in his excellent book, Antifragile, “A very rarely discussed property of data: it is toxic in large quanities — even in moderate quantities… The more frequently you look at data, the more noise you are disproportionally likely to get.”
Earplugs are inexpensive and fairly comfortable. Focus on the good, quiet stuff.
Cite WAR and KenPom and the point spread and leave the smart people to deal with their eardrums popping in their caves. Eventually, we’ll get a call when they learn something meaningful. And, for now, take solace in the fact that if you can deconstruct a point spread, even with mild accuracy: You know more than 99.9% of the population and everyone you see on your television. Chew on my frayed cuticles, Skip Bayless.