The NFL’s Quest to Quantify Quarterback Evaluation

By Robert MaysApril 17, 2020, 11:25 am UTC • 12 min

The future is already here—it’s just not very evenly distributed. The when and the where of that spooky bit of prophecy are murky, but it comes from science fiction writer William Gibson, whose work in the 1980s helped create an entire vernacular for the internet age. In Gibson’s mind, the seemingly fantastic structures that would come to rule our lives were already in place, even if they existed only in certain corners of the world. The same holds true for the coming proliferation of analytics in the NFL.

Analytical thought has trickled into pro football slowly, but its benefits have been undeniable. The Ravens—who ran roughshod over the rest of the NFL last season—feature a robust analytics department, including a 25-year-old cognitive-science major who advises their 57-year-old head coach when to go for it on fourth down. Nearly every franchise has invested in the silos of information produced by the people at Pro Football Focus. For the past two seasons, teams have been granted access to the stockpile of player-tracking data retrieved from chips in a player’s shoulder pads—and forward-thinking organizations are doing all they can to weaponize those numbers.

The Ringer’s 2020 NFL Draft Guide

Everything you need to know about the top players in this year’s class

Compared to other professional sports, though, football’s use of math is still in its infancy. Nerds have ruled Major League Baseball for years. The war of ideas in that sport is over, and the quants won. Modern analytics have become such an ingrained part of MLB front offices that areas once rife with inefficiencies have been completely picked over. Subjects like prospect evaluation have been staked out and developed, forcing teams to move on and seek out the next unexplored territory. Football, on the other hand, is akin to the old West. There’s unclaimed real estate—and untapped potential—as far as the eye can see.

As teams sift through reams of player-tracking numbers, build neural networks, and vet the résumés of Ivy League graduates for prospective talent, the goal is to identify which domains of the football world will yield the biggest advantages. And, as they often do, all roads will eventually lead to the quarterback.

Finding the right quarterback can dictate the future of an entire franchise. It’s the most important decision a general manager can make. But what if you could take some guesswork out of the equation? What if a team could use analytical tools to construct a model to decipher QB decision-making and predict which prospects would succeed or fail? To put it another way: What if quarterback evaluation could be solved? “That would be the Holy Grail,” says ESPN senior analytics specialist Brian Burke. “If you could do [that] at the college level to assess quarterback decision-making, you’d be cashing some big checks.”

The desire to quantify a quarterback’s potential has existed for a long time. David Lewin introduced his prediction model for college quarterbacks way back in 2006. He found that the best predictive metrics for success in the NFL were games started and completion percentage. It’s a simple approach to a complex idea, but even to this day, teams are wary of flash-in-the-pan QBs who don’t have much starting experience. During the ensuing decade and a half, other smart people have updated Lewin’s model or developed their own ways to study the position. Football Outsiders’ QBASE projection system incorporates Lewin’s factors while also taking into account adjusted yards per attempt, projected draft position (to ensure that scouting analysis was factored in), and team passing efficiency based on Bill Connelly’s opponent-adjusted SP+ stats. Pro Football Focus’s experts have baked their player grades into prospect evaluation and projection. Those analysts and the teams that use their numbers can study a QB’s performance and habits down to granular details. If a scout wants to pull up tape of every instance when a prospect earned a negative grade on a post route, they can do it with a couple of clicks.

“That would be the Holy Grail. If you could do [that] at the college level to assess quarterback decision-making, you’d be cashing some big checks.” — ESPN senior analytics specialist Brian Burke

Until recently, though, most quarterback metrics have been limited by a lack of access to advanced data, or the biases and time constraints of human charting. Even innovators like Jeremy Hochstedler—whose company, Telemetry, was among the first to analyze player positioning and movement in the NFL—were forced to use information from EA Sports to format their early models. But with the advent of player-tracking data, analysts have been introduced to a whole new world of available information. When the NFL first issued new wearable technology to track player movement, people like Burke went from complaining about a lack of data to swimming in it. “Typically with an analyst, if you ask a question, he or she would say, ‘More data,’” Burke says. “But in this case, we have mountains of it.”

Advances in player tracking have helped create a wave of new metrics that can contextualize QB play in the NFL. Both Hochstedler and the NFL’s Next Gen Stats group have developed their own versions of expected completion percentage, a stat that attempts to improve on traditional completion rate by taking into account depth of target, receiver separation, and other factors to create a truer representation of QB accuracy and efficiency. For a long time, completion percentage was used as an indicator of accuracy. But as Hochstedler says, you could plug a high school quarterback into the best offense in football, and if he threw to a halfback in the flat every time, he’d probably complete most of his passes. “Does that mean he’s the most accurate?” Hochstedler says. “No. Because to win games, everybody knows you can’t just check down to your halfback every time. You’ve got to make the downfield throws—they call them pro-level throws. You’ve got to be a complete quarterback. When you take into account all the features of a play, of a pass, then you’ve got a real understanding of the quarterback.”

Last year, Burke took that thinking and expanded it even further. At the 2019 Sloan Sports Analytics Conference, he presented a paper titled “DeepQB: Deep Learning With Player Tracking to Quantify Quarterback Decision-Making & Performance.” Burke ran every pass from the 2016 and 2017 NFL regular and postseasons through a neural network designed to predict which receiver the quarterback would throw to. The network’s conclusions were based on a variety of factors, including “receiver position, velocity, acceleration, and orientation, as well as those of the secondary.” Burke also used ESPN’s video analysis tracking system to determine whether a quarterback was under duress at the time of the throw and if a play fake occurred before the pass. “Over time, [the neural network] learned the patterns,” Burke says. “[I could] tell it the X and Y position of all the receivers and all the defenders and a whole bunch of other information, and it would learn the pattern. It would learn, ‘Oh, this player’s wide open.’ Or, ‘This player’s not wide open, but he has a step on his defender.’ Or, ‘Oh, there’s no safety in the middle of the field. So he’s probably going to throw the post.’ Over time it learned to predict with pretty good accuracy which receiver would be thrown to.”

Burke’s study was aimed at predicting quarterback habits, but other variants of the model provide information that may help to understand a QB’s decision-making. Player-tracking data processed through a neural network like Burke’s could determine the expected yards for each eligible receiver on a given play—and indicate how often a quarterback threw to his best option. That type of information could theoretically quantify the quality of a prospect’s decision-making and allow evaluators to better understand the complicated inner workings of a QB’s mind.

Previously, the absence of radio frequency identification chips like those worn in the pros led to a lack of player-tracking data for college football—but that may change soon. “I think the tracking is going to come maybe through video instead of through the chip hardware technology that the NFL uses,” Burke says. “Once that arrives … I think there’s some very, very promising things coming online.” According to some people, that technology is already here.

When Craig Buntin and Mehrsan Javan started the company that would later become Sportlogiq, their goal was to create technology to help develop self-driving cars. Eventually, Buntin—a former professional figure skater—realized that the video technology they were using could be applied to his work on the ice, analyzing his speed, acceleration, and spins with striking accuracy. It didn’t take long for Butin and Javan to see that the best application of their software was sports, not sports cars. Based in Montreal, Sportlogiq first delved into the world of pro hockey in 2015. Five years later, its client base has expanded to 29 of the NHL’s 31 clubs.

The company’s initial foray into football came with the hometown Montreal Alouettes. They quickly learned that the model—which was calibrated to track hockey players moving as fast as 40 mph—was easily applicable to the relatively slow-moving world of football. “The tracking of the players was something we realized we could do immediately,” says David Goldman, Sportlogiq’s director of football operations. “The models that we built for hockey are essentially the same kind of models we can use for football. … Because we built it on hockey, every other sport becomes a little easier. Because no one goes as fast as hockey.” The team’s main obstacle was the limited scope of the camera angle on a football telecast compared to a hockey broadcast. But after getting ahold of the all-22 feeds for college and NFL games, it was clear that they had a fully formed product on their hands.

“We’re there, and we’re excited that we’re there. It’s certainly three to five years earlier than I ever anticipated, even close to being like this.” — Jeremy Hochstedler

As Goldman and his colleagues surveyed the football landscape, they realized there was a void in player tracking for college football. NFL teams already had Next Gen Stats available to them, but the prospect-evaluation market had yet to be fully explored. To make their product marketable immediately, Sportlogiq reached out to Hochstedler’s company, Telemetry, to utilize its detailed user interface. Hochstedler had spent years mining NFL player-tracking data to create software that mapped out route combinations, formations, coverages, and other sortable intel that could make film study more efficient for NFL teams. But he was a skeptic when it came to the veracity of video-based player tracking until he saw what Sportlogiq’s software could produce. “I never in my wildest dreams thought we were anywhere close to any type of quality data sets based on optical tracking, especially from the all-22,” Hochstedler says.

When Sportlogiq ran this year’s Super Bowl through its model to test the system against Next Gen Stats data, Hochstedler was blown away by the correlation. “It all comes down to data quality in the end, and it’s there,” Hochstedler says. “We’re doing work with it. We’re there, and we’re excited that we’re there. It’s certainly three to five years earlier than I ever anticipated, even close to being like this.”

Due to contract restraints, the Sportlogiq-Telemetry partnership can pitch only to NFL teams. Hochstedler says that at this year’s combine, they met with nearly every organization in the league. Late February is typically too late for front offices to use much new data in their draft evaluations, but this spring, travel restrictions stemming from COVID-19 changed all that. “With pro days washed out, teams started calling and saying, ‘Hey, can you get us speeds, can you get us acceleration, can you get us change of direction?’” Hochstedler says. “They’ve got dedicated developers and data sciences and models that they’re using, just like we do. And they’re taking the college data and they’re running it through their model. Whether it’s the change-of-direction model, or decision-making model, or an accuracy model, whatever it is, they’re applying the college data to that model they’ve already built.” The pieces are now in place for teams to utilize college-tracking data in whatever models they’ve constructed—including QB decision-making algorithms.

The cloak-and-dagger world of NFL front offices remains mostly opaque to outsiders. Player evaluation is closer to espionage than science, and the secretive nature of most executives makes it nearly impossible to know which teams are doing what when it comes to quantitative quarterback evaluation. When I asked members of two analytically inclined front offices about the subject earlier this year, they pretty much cackled at me like they were Ray Liotta in Goodfellas.

Hochstedler estimates that four to six teams have someone in-house with the geospatial expertise to create a model for quarterback decision-making. Among those half-dozen clubs, there’s no easy way to know how much progress has been made. But as a hypothetical, let’s just say a few of them have made significant headway. If they do have a model in hand, the question then becomes what limitations and applications that model would have in evaluating prospects.

Hochstedler’s company has already created a system, similar to the one Burke used in his DeepQB study, that can predict expected yardage for each eligible target on a given play. The hope is that with a few mouse clicks, a team could theoretically call up every clip in which a QB threw to the receiver expected to gain the most yardage, every play in which he didn’t, and every play in between.

On its own, that doesn’t tell us much. Without knowing the assignments and the QB’s intention on a play, there’s no way to truly understand whether his decision was right or wrong. “The decision-making part, that’s a lot different,” Hochstedler says. “Because we could say, ‘Hey, did he hit the most optimal target? Yes or no.’ We can say that, but based on his offense and based on how he was coached to run a specific play, his third progression may have been ‘optimal,’ but he hit his first target and he picked up 8 yards.”

Armed with all that information, though, teams could streamline both their film-study schedule and their interactions with college prospects. Instead of spending hours assembling a reel of good and bad decisions, the task could take minutes. And while a collection of throws where a QB bypassed his best option may not mean much by itself, paired with an interview where a position coach could ask a quarterback why he made those decisions, it becomes extremely useful.

When Burke was analyzing the limits of his DeepQB model—and by extension, the limits that a similar system might have in evaluating QB prospects—he pointed to a lack of football expertise. The neural network can understand the data, but it can’t grasp the nuances of how a certain team plays Cover-3. “Was this a zone or was this man?” Burke says. “That makes a huge difference in terms of separation. You could have a 3-yard buffer around you as a receiver, and in a zone you’re covered. You could have a half-a-yard step on a quarterback as a wide receiver in man-to-man coverage and be wide open. So what I’ve learned is we have to take a couple steps back here and learn the context of these plays.”

Long before he studied football data, Burke was a Navy pilot. He often finds similarities between his experience in the air and his current work. In the Navy, the top fighter pilots often had traits that didn’t initially seem crucial, but turned out to be accurate predictors of success. The pilots who were skilled at navigating by using radio instruments—or put another way, those who could fly without much visibility—often became the best. “The Navy found that for some reason, the guys that turned out to be really good fighter pilots just so happened to be good at that one particular skill,” Burke says. “It didn’t intuitively map together, but it just had to do with how they processed information in their head or something like that. But you stumble upon these correlations that could be predictive.” He thinks the same might be true for quarterbacks. And with a streamlined scouting process, those traits may become easier to identify and organize.

At this stage, with all the complex factors that go into quarterback play and the short history of player-tracking data at the college level, no single model can provide an accurate prediction of whether a QB will succeed or fail. But it won’t be long before teams can rely on these systems as another resource. “It could be much easier to just sort and filter and just show me the video of all situations where he made a really bad decision, or show me where he made the decision that wasn’t recommended, but it turned out to be a really good play,” Burke says. “It just becomes a tool to help human scouts do their jobs faster and better.”

Robert Mays

The NFL’s Quest to Quantify Quarterback Evaluation

The NFL’s Quest to Quantify Quarterback Evaluation

The Ringer’s 2020 NFL Draft Guide

Keep Exploring