clock menu more-arrow no yes

Filed under:

Baseball Stats Still Haven’t Pivoted Away From Video

Statcast looked like it might extinguish the sport’s data providers of old. Instead, its emergence has only strengthened the sport’s information economy. Here’s how—and here’s how long the harmony might last.

Amed Rosario running with half of his body highlighted in green Getty Images/Ringer illustration

In July 2009, a tech company called Sportvision held its second annual “PITCHf/x summit” in San Francisco. Sportvision, whose founder briefly brought the glowing puck to hockey and more lastingly supplied the virtual first-down line to football, was then about two years removed from revolutionizing baseball analysis with its PITCHf/x system, a network of cameras and computers installed in every major league park prior to the 2008 season that tracked the speed, movement, and location of every pitch. At the summit, Sportvision announced plans for a much more ambitious follow-up: FIELDf/x, which would record the position of every player on the field. Cutting-edge team and internet analysts flocked to San Francisco to hear or deliver presentations about the potentially transformative technology, and Sportvision broadcast the conference via livestream for any writers with way too much time on their hands who may have wanted to live-blog a stat summit at length.

After a day of digesting and daydreaming about data, the summit attendees went to a Giants game at AT&T Park (thereby disproving silly stereotypes about stat nerds not watching baseball). One attendee recently recalled that during the game, an internet analyst told Ben Jedlovec, a research analyst for Baseball Info Solutions—a baseball data provider that relied on human input from “video scouts” and ballpark-based stringers—that with the way Sportvision was expanding its automated offerings, BIS would “be out of business in two years.” Sportvision, and automated tracking technology, looked like the future of baseball stats, and there seemed to be little room in that future for the eye test, however rigorous.

Almost a decade later, automated tracking technology in sports is, as expected, more sophisticated and more ubiquitous than ever before. Although Sportvision has since been acquired by another company and seen its technology fall out of favor at the big league level—its vaunted FIELDf/x system exists only in the minors—Major League Baseball Advanced Media’s camera- and radar-based Statcast system is recording every pitched and batted ball and every on-field movement players make at major league parks. Meanwhile, TrackMan and other tracking companies have made incursions into the minors, international leagues, independent leagues, and even amateur ball. Yet despite that inexorable advancement, BIS and competing data providers, including Inside Edge, have not only survived, but thrived, with BIS recording record revenues for several years running, according to multiple sources inside and outside of the company.

It seems unlikely that such seemingly redundant data sources could coexist in harmony, and that observation-based providers like BIS and IE could have weathered the advent of Statcast without going the way of Blockbuster stores in the streaming era. Yet BIS, IE, and others have shown that the introduction of a disruptive technology creates opportunities that a nimble company can exploit, just as a hitter can conceivably beat a defensive shift by setting his sights on newly opened territory. For the time being, at least, baseball hasn’t entirely pivoted away from video. Statcast has only strengthened the sport’s information economy.

Baseball Info Solutions—or, as it’s also sometimes referred to, Sports Info Solutions, or SIS—was cofounded in 2002 by John Dewan, who had formerly cofounded STATS, another data provider that, unlike BIS, focuses more on media clients than teams, and more on raw data feeds than analysis or proprietary stats. Dewan says he’s feared obsolescence throughout his whole three-decade career as a purveyor of baseball information, going back to a time when he worried about being beaten by SportsTicker (which was later acquired by STATS) or MLB’s official record keeper, the Elias Sports Bureau, or what it would mean when USA Today dropped the STATS box-score service. The names and natures of the challengers have changed, but the risk remains the same. “That’s one of my later paranoias,” Dewan says. “Is Statcast going to put us out of business?”

Dewan says that BIS was the first company to collect pitch-type location and velocity data for every major league game. In addition to that, the company created the “good fielding plays” and “defensive misplays” classification system that formed the first basis of its proprietary fielding metric, defensive runs saved (DRS); timed everything from the flight of batted balls to pitchers’ delivery times to runners’ sprints to first; and tallied teams’ uses of infield shifts. Some of that data is publicly available via sites such as FanGraphs, while some is reserved for team clients. According to Dewan, BIS is currently serving roughly 25 MLB team clients, as many as it ever has. Inside Edge, whose history dates back to the late 1980s, offers many statistical services along the same lines. Although IE’s business model is more media-centric than BIS’s, VP of product and sales Kenny Kendrena says he’s working with about 21 MLB team clients—a 50 percent increase from when Kendrena started in June 2006, at the dawn of the PITCHf/x era.

There are several ways in which companies like BIS, IE, and STATS (the last of which didn’t respond to an interview request) can keep convincing teams to pay for their services even as those clients receive free feeds of Statcast data from MLBAM. In some cases, there’s value in ingesting the “same” information from multiple sources; Statcast pitch-type classifications, for example, are algorithmically generated in real time, and teams may trust manual classifications by experienced video scouts more—or at least like to have them on hand for comparison’s sake as a second line of defense against bad data. There’s also something to be said for consistency across a larger sample of seasons; Statcast was installed in every MLB park in 2015, but BIS or IE data allows comparisons between players over a longer time frame (although the video scouts’ methods may have changed over the years, as assorted biases in subjective measures like batted-ball type or the difficulty of fielding plays were detected and corrected).

Then there’s the data that Kendrena describes as “augmenting” the automated info—subtle aspects of certain plays that not only don’t show up in the box score, but aren’t currently captured by Statcast. If the outcome of a play changes because a fly ball gets lost in the lights, a grounder takes a bad hop, or a runner stumbles out of the box, an IE video scout will note that. In those cases, Statcast would show that an outfielder let a ball drop, an infielder let a ball get by him, or a runner took too long to reach first, but the supplementary info from the video scouts would help explain how and why those things happened. In some cases, the extra level of detail might matter. Although both companies are perpetually on the lookout for areas in which a human can see something that the cameras or radar might miss, the returns are diminishing at the major league level: A few years ago, BIS started tracking broken bats, checked swings, and bunts pulled back, but while those minutiae made for intriguing trivia, they didn’t prove particularly lucrative.

Rather than allowing automated stats to supplant its own ratings, BIS has enlisted Statcast info—which it obtains from its team clients—to make its own preexisting stats smarter, releasing a Statcast-infused version of DRS that takes the new system’s precise measurements of fielder range and positioning into account. (Although MLBAM began publishing its own Statcast-powered outfielder ratings last season, it has yet to do the same for infielders.) Inside Edge, meanwhile, has introduced a subscription service dubbed “Remarkable,” which mines Statcast data for fun facts and insights that can be incorporated into scouting reports or broadcast notes. In both cases, the companies turned a threat into an asset, using Statcast data to build better products that they could then market to teams or broadcast clients. IE is currently in talks about licensing Remarkable to MLBAM—essentially slicing and dicing MLBAM’s data and selling it back to its source.

That process also works in reverse. Statcast still isn’t perfect, and the information it misses allows BIS to sell a service to teams called “Statcast Data Cleanup.” According to BIS, approximately 15 percent of plays from the three-plus-season Statcast era—most often grounders or pop-ups with extreme launch angles—have some missing data. Using BIS’s human-captured landing locations, hang times, and batted-ball types for each ball in play, the company can estimate missing exit velocities, launch and spray angles, and spin rates, all of which are closely correlated with the real deal. This year, the company has expanded that service into something it calls “Synthetic Statcast,” which provides quasi-Statcast batted-ball information for years and leagues where Statcast wasn’t installed: MLB back to 2010, the upper minors back to 2013, and, this year, Nippon Professional Baseball. The Statcast cleanup service runs clients $15,000 a year and includes “Synthetic Statcast,” which can also be licensed as a stand-alone product for lower levels. The product that Dewan worried would be the end of his company has instead turned into an extra revenue source, as BIS has morphed into a sort of statistical cleaner fish, earning its own protected place by helping with the hygiene of a larger entity that by all appearances could easily eat it alive. “The more information there is, the more we can do,” Dewan says.

Both Dewan and Kendrena say that they haven’t retired any service that they used to provide. “Since Statcast is collecting batted-ball velocity and trajectory, maybe our data is not as vital to have,” Dewan says. “But it hasn’t diminished people using it.” Kendrena notes that the needs and desires of IE’s clients vary so much that even though most teams are subscribers, there may not be two teams that receive the same array of services. “Some will still just take one feed that no one else does,” Kendrena says. “It really runs the gamut. And then you have others that just want everything you’ve got.”

Harry Pavlidis, the director of technology for Baseball Prospectus, provides data and data-cleanup services to MLB clients through both BP and Pitch Info, an independent consultant that supplies the pitch classifications at Brooks Baseball and also serves close to 20 teams. According to Pavlidis, recent years have been a boom time for any company that provides statistical services to teams. “All of our revenues are up,” he says, adding, “we all adapt to market openings.”

Pavlidis calls this trend toward higher revenues for data providers an “immature industry problem,” although it hasn’t been a problem for him or his well-remunerated rivals. There’s more data than ever available to teams, all of which are at least nominally receptive to it, but not every team is well-equipped to clean up and analyze the terabytes of raw data that Statcast spits out. Although in theory any team that’s lagging behind could hire its own employees to do data cleanup internally—as some clubs do, believing that the effort may yield a competitive advantage—there’s an opportunity cost to that undertaking. Those hours could be devoted to other tasks, which might have greater competitive payoffs than whatever incremental improvement in data quality a team’s in-house staffers might be able to eke out compared to Pitch Info or BIS.

Team employees largely echo the message that the data providers are sending themselves. “While some things they do are now redundant, they do provide a service in sanity-checking Statcast,” the head of one R&D department says of companies like BIS and IE. “Statcast will miss pitches/batted balls or report data that is clearly off, and the manual stringers will fill in those gaps. They also provide data in leagues that do not have ball/player-tracking technology yet.” Another R&D department head notes that even if some teams don’t pay for BIS’s Statcast cleanup services, buying BIS’s data allows them to do the cleanup themselves. And a third R&D director says that his team his trimmed its dependence on outside providers to tough-to-automate defensive stats such as scoops. “I think the day will come when Statcast provides better coverage of those types of events and we won’t need them at all,” he says.

One AL baseball operations executive explains that his team has cut back on BIS/IE data but maintains a strategic toehold. “Statcast has certainly reduced our reliance on those outside data sources,” he says. “Prior to Statcast, we were heavily reliant on these services for bulk charting that we just don’t have the manpower to do. Now, we rely on them for things that Statcast doesn’t pick [up].” He too cites the small percentage of plays with missing or flawed data as areas where outside companies can help, as well as cases where human stringers can record “intricate details” that elude the cameras and radar.

A baseball ops staffer for another AL team is more pessimistic, reporting that his team has narrowed its usage of outside data to a small selection of reports that “would cost [them] zero wins” to discard, keeping those around only temporarily for familiarity’s sake. “I think a lot of their staying power, if any, will depend on how willing (and how quickly) they’re willing to abandon their core product [of MLB data],” he says about BIS and IE. “Producing high-quality pitch-level data at the college level, Korea, Japan, etc., will keep them alive, but even in those spaces, pitch-tracking tech is becoming more widespread.”

Both BIS and IE have either dabbled in or wholeheartedly pursued stat collection in the minors, NPB, and amateur ball, and as tracking technology has grown more accurate and more portable, teams’ appetites for data from those levels and leagues has soared. “NCAA usage has grown massively, and Japan is a reasonable footprint,” Pavlidis says. “And then we have a recent increase in the use of high-school data. As TrackMan has improved the quality, more and more teams are dipping into it in earnest.” BIS and IE have also debuted football products within the past few years (and are weighing basketball opportunities), although the NFL is still lagging behind MLB in its embrace of stats, which has suppressed spending on data providers. “We need the Moneyball of football to come out,” Dewan says. (Evidently The Blind Side didn’t do the trick.)

Although baseball’s outside data providers have eluded not only death but decline for longer than anticipated, the outlook for BIS and IE is still uncertain, and their anxiety hasn’t completely subsided. Dewan says, “We look at our research department as an extension of the team,” but what will happen when teams have so many quants working for them that they don’t need the extra helping hands? And what will BIS and IE pivot to when Statcast-style technology is cheap enough to install anywhere and accurate enough not to need much finessing? That future could be years away, but it seems almost unavoidable. Of course, that’s what the smartest and most informed observers said in 2009, too. That long-ago prediction at AT&T Park sounds laughable in retrospect, but both Jedlovec (who eventually became the president of BIS) and Graham Goldbeck, the former manager of data analytics and operations for Sportvision, have started working on Statcast data quality for MLBAM this year. That could be a sign of Statcast consolidating its power.

One former head of an AL team’s R&D department, who says he also thought that BIS’s and IE’s days were numbered when he first heard the phrase “FIELDf/x,” says, “We’ve all seen new technologies sweep older technologies aside in short order, but there have also been cases where companies with older technologies were remarkably creative in finding ways to add new value or extend their useful life.” He adds that his forecast for the companies’ financial futures was off because he didn’t anticipate how many teams would hand over tracking data to outside sources. The industry has erred before in classifying a baseball institution as endangered: Although some expected that the sabermetric movement would spell the end of scouting, teams have actually added scouts over the same period that that’ve rapidly increased their quant counts.

“My philosophy has always been to purchase everything that’s available, as data and products tend to be extremely cheap relative to team payrolls,” says one number-cruncher who’s consulted or worked full-time for multiple teams. It’s a sensible stance: The $15,000 a year that Statcast cleanup costs—less than even an underpaid, part-time intern would cost—is next to nothing to a team that’s worth billions, and there’s so much money to be made from each additional win and playoff appearance that teams are happy to have their hands on all the information they can. In the past, tight-fisted team owners haven’t always shared their deputies’ thirst for knowledge, but lately they’ve loosened the purse strings on the baseball-ops side, enabling late-adopting departments to spend money to make money.

Baseball’s third-party data providers hope that some portion of that spending will keep cascading down to them. Every summer, BIS holds a planning meeting to identify data points that the company could be collecting but currently isn’t. And every season so far, it’s stayed ahead of the sport’s rising statistical tide. “As long as you are always trying to think of the next thing and … what’s going to be helpful, there’s an unlimited amount of information that can be gathered and analyzed,” Dewan says. Teams, and time, will continue to test that contention.