Leicester City, Brexit, and Donald Trump have made this a weird stretch for math. And yet despite those high-profile instances of low-probability outcomes hitting, when the New England Patriots fell behind the Atlanta Falcons 28–3 in the third quarter of Super Bowl LI, viewers still flocked en masse to websites that calculate win probability, a metric designed to reveal how likely a side is to prevail — and a metric that failed in the aforementioned cases.
The parameters vary site to site, but the guiding principles remain the same: When it comes to football, a win probability model is constructed to account for the score, the time remaining, the field position, the timeouts remaining, and many other small factors. And at 28–3, the metrics gave the Patriots less than a 1 percent chance of winning the game.
It’s hard to know exactly who was glued to the live probability updates during the game: Maybe they were Patriots haters enjoying the schadenfreude of seeing a numerical representation of just how unlikely a comeback was; maybe they were Patriots fans looking for a sliver of hope, however slender. Regardless, they came in droves. The traffic to analytics site numberFire.com doubled in the second half, according to its internal metrics. Ten percent of the season’s traffic to the win probability calculator on Pro-Football-Reference occurred during and just after the game, according to the site’s metrics. Tweets like this were shared thousands of times:
When the Patriots completed their revival in overtime, sealing the biggest comeback in Super Bowl history in just more than 20 minutes of game action, math took another L, leaving win probability experts in the strange place of having to defend the numbers.
“It’s tough,” says Mike Kania, head of football operations at Sports Reference. “A lot of people who don’t understand probability are coming out and saying, ‘Why are these sites predicting anything anymore?’ And the election is brought up a lot. But 28–3 is still a very, very safe lead.”
For the first time in the digital age of easily accessible data, the extremely probable event regularly seems to be turning into the exception, and that was especially true this football season. Well before the Patriots scored a knockout against math, the San Diego Chargers blew second-half win probabilities of 99.9 percent in Week 1, 84.7 percent in Week 3, and 99.8 percent in Week 4, according to Pro-Football-Reference. The Detroit Lions broke math in the opposite direction, winning a Week 1 game over the Colts that they stood a 99 percent chance of losing in the second half, then continuing to mount unlikely fourth-quarter comebacks. The Packers, who made the NFC title game, at one point in November had just an 18 percent chance of making the playoffs. The Patriots’ recovery was the season’s fiercest rebuke of the metrics yet, but it wasn’t the first.
“I used to get angry and defensive,” says ESPN senior analytics specialist Brian Burke, a football win probability expert who built the most influential model. “Now I think, ‘Man, I want you on the other sideline of life from me, I want to compete against you.’ In the long run, guess who will win? People like that are shooting themselves in the foot.”
Burke thinks of himself as the Wright brothers of win probability: Other people had the idea for the airplane, he says, but the Wrights got their vessel off the ground. Similarly, while the influential 1988 football analytics book The Hidden Game of Football gave an early outline of when teams are more likely to win based on the game situation, Burke says the authors lacked the technology to build an actual model, which he launched in 2008. The idea came when he was watching a Ravens game the previous year and heard the broadcasters explain that Brian Billick was almost perfect in the second half when leading by two touchdowns. Burke, who has since consulted for NFL teams, surmised that all teams are likely good in that situation, and began building a model to confirm his suspicion, developing the one that’s now used at ESPN and is the basis for many other football win probability formulas.
Burke is often asked if a comeback was one-in-1,000 or one-in-a-million, but stresses that his model is not a “trivia generator” designed to quantify how rare a comeback is. Rather, he says it was built to help teams make decisions about timeout usage or fourth-down decisions or other key matters based on the likelihood of victory at a given moment. Teams still use win probability to make decisions, but that has become a small part of its fame. These days, the most common usage is real-time fan and media consumption.
While win probability has gone mainstream, the data that propel it are still fairly limited because of the nature of football. There are only 267 NFL games per season (including the playoffs), meaning there are very few 28–3 leads to study. In fact, there were only 12 regular-season or playoff games this season in which a team ever trailed by 25 points. That small sample size hurts predictive ability. So, too, does the fact that the way the sport is played is changing faster than ever before, meaning data from even a handful of years ago may not be as useful anymore.
“The current climate has changed a little bit,” says Keith Goldner, chief analyst at numberFire, who compares changes in the sport to changes in the voting body that hurt political analysts’ ability to project Donald Trump’s rise. “Teams have a much higher propensity to pass the ball, passing numbers since the early 2000s have increased dramatically,” Goldner says. That shift means some of the data from before the pass-happy era is slightly outdated in its predictive powers now, Goldner says, “in the same way you can’t use data from the steroid era of baseball, because there are different run environments now, that’s the sort of thing that might be included in our model [going forward]. A four-score game, over the course of the season, is an extremely safe lead, it might be less safe than it was 10 years ago.”
Burke says to look at yards per pass attempt, which has increased 0.2 yards per attempt since 2010. The jump from 6.2 to 6.4 may seem insignificant, but Burke says that over the course of a season, a bump like that can have an “enormous” effect on the sport and start to change what a model should value. Namely, in a world in which teams gain yards more easily, field position should matter less and possession of the ball should matter more in the math. Burke says the sport is “getting closer” to absurd levels of offense that would necessitate adjusting models to reflect how little field position matters.
“Teams can move the ball easier so possession matters more and field position matters less,” Burke says. “In the [1920s], it was a 6–3 game and a deep punt in the second half might be a death blow. It was a field position game, even in the late ’70s. Now things are changing even more.” Nowadays, Burke mentions, Aaron Rodgers can get the ball anywhere on the field and score a touchdown, making it harder to project how much a good punt helps the opposing team.
Now that multiple websites are getting attention for live win probability, other math experts are starting to develop their own models — and some of them want to make big changes. About a month ago, Konstantinos Pelechrinis, an associate professor at the University of Pittsburgh’s School of Information Sciences, started to see NFL win probability models he thought were “too optimistic,” giving teams too high a chance to win in games he thought could still go either way. Pelechrinis built a model for pregame predictions 18 months ago, and started to build an in-game prediction model this winter, which he hopes to complete in a few months. He says that his research confirmed what we all saw this season: haywire results. Pelechrinis claims he found 50 games in the 2016 season in which teams had a 98 percent chance to win a game on Pro-Football-Reference, but that those teams won just 80 percent of the time. He wants his model to avoid bold declarations of 99.9 percent chances in all but the most dramatic of circumstances.
Burke, however, isn’t panicking about his own model. “We’re not going to say, ‘Oh no, we have to fudge [the formula] a little bit’ to reflect the changes,” says Burke, who believes it’s too early to tell how this season’s cluster of high-profile, low-probability games will affect the database going forward. Burke’s team will not throw away “freaky outcomes” and plans to include every comeback in its data. Sports Reference’s Kania says he doesn’t yet know if this hectic season will lead him to make any tweaks to his site’s formula, which is not live but is posted after every game.
“I don’t relish [that people are doubting analytics],” Burke says. “There are coaches in the NFL like that, who say things publicly, and I’m not going to go on Twitter and rip them.
“I’m going to make a note that that coach is a mark.”