clock menu more-arrow no yes

Filed under:

Sabermetrics Meets the 60-Game Season

In MLB this season, the small sample is the only sample. Throw in some new rules and a schedule that limits travel, and the analytics community has a lot to figure out.

Getty Images/Ringer illustration

In less than a week, Major League Baseball will celebrate a belated Opening Day. About 10 weeks later, pandemic permitting, its 60-game regular season will end. Even if MLB’s spotty testing process proves up to the task of coping with COVID-19 and every club completes its schedule, there will be fewer big league games played in 2020 than in any year since 1900, the year before the birth of the American League. Among the myriad parties affected by that relative lack of action—and a set of associated rules changes and roster reconfigurations—is baseball’s robust sabermetric community, which turned “small sample size” into such a frequent refrain that it’s been set to song.

“I don’t think people are prepared for just how weird it’s going to look at the end of the year, like the leaderboards and all that,” says Sean Forman, the founder and president of Sports Reference, parent company of beloved internet encyclopedia Baseball-Reference.com. “Assuming they get 60 games in, there are going to be probably dozens of players who have never been in a top 10 of anything appearing on top 10 leaderboards. It’s just going to look really odd historically 10 years from now when we are able to look back on this season.”

With the first official pitches of this season still six days away, baseball’s leading statheads don’t yet have the luxury of looking back. And while many people may be unprepared for the randomness and statistical uncertainties of a 60-game season, it’s Forman’s job to consider the quirks and figure out fixes before they foul up the sport’s go-to stats source. “All the weird things that happen, we try to get them right and make the adjustments for that, and so maybe more weird things is actually better for us in the long run,” Forman says.

In the short term, though, it means more work. For Forman and other purveyors and displayers of advanced baseball metrics, the 2020 season poses two separate problems: calculating the stats and presenting the stats. Take the automatic-runner rule, which combines both problems. In order to shorten games and minimize roster strain, extra innings this season will start with a runner on second base, as they have in the minor leagues for the past two seasons. When an automatic runner scores, an unearned run will be assessed to the pitcher who starts the inning. Baseball-Reference’s version of pitcher WAR is based on runs allowed and doesn’t distinguish between earned and unearned runs, which means that if Forman took no action, an extra-innings pitcher would unjustly suffer the same WAR penalty as a pitcher who actually allowed a runner to reach.

Although the company hasn’t yet coded a solution, Forman has decided what to do: WAR, the all-encompassing metric that always adapts to the times, will shapeshift once more. Historically, a runner on second with no outs has raised the average run expectancy of an inning by a little more than 0.6 runs. By using that elevated run expectancy as the baseline for extra innings, Baseball-Reference WAR will avoid disproportionately penalizing extra-innings arms. “We’re going to track the number of innings that the pitchers start in extra innings,” Forman says. “So we’ll track that number, and then basically our league-average pitcher whom we compare the pitcher to is going to have his runs allowed adjusted based on the number of extra innings that pitcher is starting. We’re going to have to track that stat, basically, as something obviously that we’ve never tracked before, which is extra innings started or begun.”

No, it’s not the sexiest stat, but it’s a necessary response to strange statistical circumstances. And then there’s the runner: Although for WAR purposes an automatic runner can be treated as any other baserunner would be, Baseball-Reference still needs a way to convey the sequence of events on its pages for extra-inning games. B-Ref uses Ted Turocy’s Chadwick software, which already accounts for the automatic-runner rule in the minors, to convert MLB’s stat feed into Retrosheet format and generate box scores, but B-Ref will likely have to make additional adjustments to textual descriptions on the site. “We may have to think about how we present that on the play-by-play for the box scores,” Forman says. “I think it’s going to be one of those things where, like, the first Friday after the first games, we’re going to be working through the code and making sure that everything is working and modifying things as needed.”

WAR may need another under-the-hood adjustment thanks to the way park factors will work in 2020. Because the schedule is not only short but also more unbalanced than usual as a result of limited travel, some park factors may seem extreme. “The Rockies are going to look like they probably have an all-time park factor this year because they’re mostly playing in pitchers’ parks on the road,” Forman says.

On top of that, there may be jarring jumps and dips in WAR at the tail end of the 2020 season as updated park factors kick in. To cut down on sample-size issues, Baseball-Reference employs park factors that are based on multiple seasons. The site typically uses the preceding season’s park factor for the first two months of the current season; after that, it uses a weighted average of the previous season and the portion of the current season that’s been played to date. At the All-Star break, the weighting would normally be about two parts the previous year and one part the present year, and the split would gradually get closer to equality as the season proceeds.

This year, two months makes up almost the entirety of the regular season. At a recent company meeting, Forman says, “I described a situation where maybe one player’s leading in WAR, and then the next day we turn on the park factor, and immediately they drop several places and lose a third of a win, or something like that. There’s literally like three days left of the season at that point. We’re having to evaluate that stuff.”

In one way, at least, 2020 may lighten the load for Baseball-Reference behind the scenes. “Not having to handle pitcher hitting actually will be something that will be kind of a nice thing at times,” Forman says, alluding to the complex pitcher positional adjustment. “It’s a little annoying to have to deal with pitcher hitting in terms of our WAR stats.”

However the final leaderboards look, Baseball-Reference will be a popular destination for fans who are trying to put 2020’s anomalous seasons in context. “The variability of this season is just going to be off the charts,” Forman says, adding, “I just think there’s going to be really weird, weird things happening statistically.” That may be more fun for most fans than it is for Forman, who may be forced to make a difficult decision about whether to include a caveat alongside certain stats.

“On the one hand, I would love to see the chaos of a .400 hitter this year, but on the other hand, I would hate to see the chaos of a .400 hitter this year, because how we look at that and how we present that to the user is the big question,” Forman says. On the current all-time single-season batting average leaderboard at B-Ref, Ross Barnes ranks third with a .4286 mark in 1876. Barnes’s Chicago White Stockings played only 66 games. Similarly, the all-time single-season ERA leaderboard is topped by Tim Keefe (0.857), whose 1880 Troy Trojans played 83 games. George Bradley’s 1876 season ranks 12th, despite his St. Louis Brown Stockings’ paltry total of 64 games.

Despite their length and vastly lower caliber of play, those early seasons aren’t accompanied on the site by any indicators of illegitimacy, but Forman is still weighing whether to add an asterisk to draw a distinction between short (but complete) 60-something-game seasons and a shortened 60-game season. “It may be a situation where we actually decide to do that if a player bats .405 for the 60-game season,” he says. “We might just add an asterisk to it to note that it was a severely shortened season.” Forman will also likely add an option to exclude 2020 from future queries via the company’s subscription-based service, Stathead.

MLB’s official historian, John Thorn, is a staunch anti-asterisker; as he wrote last month, asterisks “to me are anathema, representing a lame substitute for further examination or even thought.” Thorn notes via email that even after 1900, there have been many seasons “in which a statistician might feel the need to inform readers about underlying realities, or changes to them, before strutting out [their] numbers.” Foul balls didn’t count as strikes until 1901 in the NL and 1903 in the AL; walk-off home runs hit prior to 1920 weren’t always recorded as homers; accounting for sacrifice flies changed often until its current codification in 1954; and until 1950, pitcher wins weren’t standardized.

More recently, we’ve seen league expansions, strike-zone redefinitions, changes to the baseball, strike-shortened seasons, and other significant deviations from the norm—to the extent that there is such a thing as a norm in a sport that’s always evolving. As Forman says, “Baseball is fun in that there always seems to be a few new things that happen every year.” This year will bring more of those things than usual, and Forman expects to come across wrinkles he didn’t anticipate. He recently wrestled with whether to stick with the standard date of June 30 as the determinant of a player’s listed “seasonal age,” even though this season hadn’t yet begun on June 30. (He decided to keep things consistent.)


In some cases, old seasons could supply solutions to conundrums that may arise again this year. To qualify for the ERA title, a pitcher has to throw at least one inning per team game. In most seasons, team game totals don’t differ much. In the strike-shortened 1981 season, though—an odd split season in which the team with the majors’ best overall record (the Reds) didn’t make the playoffs—the totals ranged from 103 (four teams) to 111 (the Giants), which led to an odd edge case. Yankees rookie Dave Righetti posted a 2.05 ERA in 105 1/3 innings, but he fell short of qualifying because the Yankees played 107 games. Instead, the title went to Orioles righty Sammy Stewart, who recorded a 2.32 ERA in 112 1/3 innings. Yet if Righetti had been Stewart’s teammate, Righetti would have won the ERA title, because the Orioles played only 105 games.

Baseball-Reference renders Stewart’s ERA in bold ink, indicating a league leader, because it treats ERA and batting titles as league awards. But Righetti gets the bold ink for ERA+, FIP, and other “unofficial” rate stats. For those, Forman says, “we use a league average of games played for the qualification, because Righetti qualifiying for Stewart’s team, but not his own is just stupid. And since these aren’t ‘awarded’ by the league I feel comfortable with that.” If the coronavirus causes cancellations that lead to unequal totals of team games played, then, Forman will know how to handle the stats, although there’s no telling whether MLB will be as well prepared. Thus far, the league’s response to questions about in-season testing delays is perhaps best embodied by a timeless response to an earlier unforeseen snafu.

For Forman’s business—which he says has remained above break-even despite a dip in traffic—the strangeness of a 60-game MLB season isn’t even the most pressing problem. The NBA’s play-in series between the no. 8 and no. 9 seeds consists of official games that won’t count as playoff or regular-season games, which will demand a new class of contest on Basketball Reference. Hockey Reference has a headache, too: The NHL added a new round of playoffs that will require considerable coding. On the soccer side, some leagues suspended play and then created new criteria for ordering the standings, which complicated matters for FB Ref. “To be honest, baseball is not that bad,” Forman says.

Not all sabermetric sites face the same challenges. The more stable and meaningful the metrics in limited action, the less compromised their predictive power in a 60-game season. “A shorter season provides less sample size for comparison purposes but doesn’t meaningfully change any of our metrics,” says Daren Willman, the director of R&D at MLB and the creator of PITCHf/x, TrackMan, and Statcast clearinghouse Baseball Savant. “Our biggest priority for 2020 is seeing how the new Hawk-Eye system performs and adjusting as needed based on those results. That was going to be the case in a full-length season, too.”

FanGraphs, which also offers a WAR model, overlaps with Baseball Reference’s offerings more than Baseball Savant does, but FanGraphs won’t have to tinker with its pitching WAR, which is based on FIP and not on runs allowed. Moreover, the site uses five-year, regressed park factors, which ensures that one weird year won’t move them as much (though the weighting may still be decreased). However, the new roster rules and structure prompted alterations to the Roster Resource section of the site, and both the playoff odds and the playing-time projections they depend on were adjusted to reflect the abridged schedule. The latter figures are fraught during a pandemic that has already sidelined some players for unspecified periods. Jason Martinez, who monitors and updates the site’s depth charts and projected playing time, plans to pencil in 14-day absences for players who go on the injured list because of COVID-19 or for undisclosed reasons, but like everything else in this season, those estimates will be best guesses that are subject to change.

Unlike Baseball-Reference, FanGraphs hosts sophisticated team and player projections for the current season and upcoming seasons, which will probably suffer from the relative lack of 2019 data available. “I think that we should expect the 2021 projections to be less accurate, but only slightly,” says Jared Cross, who operates the Steamer projection system. Cross explains that performances from previous years are still predictive, and that much of the inaccuracy in projections stems from chance variation that applies in any year and may make it difficult to discern the difference in accuracy in 2021.

As Cross notes, the latest and greatest information from modern tracking technology, which Steamer and other systems incorporate to an extent, should help make the most of the months we have. “I do think we can mitigate the loss of information further by using things like fastball speed, pitch-level metrics, exit velocities, and sprint speeds where smaller sample sizes are more revealing and by looking at changes in these metrics from 2019 to 2020,” he says. FanGraphs’ Dan Szymborski, who operates the ZiPS projection system, concurs, saying, “It’s stuff we didn’t have in 1994 and 1981.” Front-office analysts have even more detailed, proprietary data at their disposal, which should make clubs more confident in their forecasts than public prognosticators can be.

In ’81 and ’94, though, there were full minor league seasons, no pandemic, and fewer confounding factors like new rules and roster limits, taxi squads packed with top prospects who can’t play anywhere else, and a four-month layoff between spring training and Summer Camp. “Maybe the weirdness of this whole scenario will make the information we do have unreliable,” Cross acknowledges, “and it will be tough having no recent data on minor leaguers.”

There’s one more area in which the short season may confound the assumptions of sabermetricians: home field advantage. Then again, home field advantage has always confounded sabermetricians. In this respect, if few others, 2020 may reveal greater truths about baseball rather than obscuring them.

In MLB history, home teams have consistently won roughly 54 percent of games, but the root of that edge remains murky, with proposed causes ranging from a fan effect on the psyches of players and umpires, to a travel/fatigue effect, to the effect of familiarity with (or affinity for) a stadium’s layout. “My theories about HFA is that it’s not about fans, but rather about familiarity with environment and travel,” says Matt Swartz, an economist who has studied home field advantage for Baseball Prospectus and now consults for the Washington Nationals (who don’t currently have a home park).

As Swartz points out, though, with only 900 regular-season games (at most) to work with, “the confidence intervals probably are too big to prove or disprove anything.” In a season of this length, a lower-than-normal home field advantage could occur by chance, although a .500 home record would be highly unlikely to happen at random. “I have no idea what to expect,” says Phil Birnbaum, the editor of SABR’s By the Numbers newsletter, who has researched and presented on home field advantage. “My best guess would be that we’ll see home field advantage is still there, but at a lower level than normal.” Some studies based on same-stadium teams in basketball and soccer suggest that the fan effect is sizable, and recent results in Bundesliga “ghost games” played without fans lend additional credence to that theory.

One of the most compelling possibilities—suggested first by Beyond the Box Score’s Dan Turkenkopf in 2008 and subsequently explored by many other analysts—is that home crowds exert some influence over umpires’ pitch calls. That finding seems to hold up under the latest and most sensitive scrutiny. “On average, home field made a called strike 1.7 percent more likely last year, all other things being equal,” says Jonathan Judge of Baseball Prospectus, whose ball/strike-call model accounts for location, count, pitch type, catcher receiving skills, and umpire tendencies, among other factors that influence calls. At an average value of 0.14 runs per called strike and 58 called strikes per caught game, Birnbaum notes, that small per-pitch margin translates to one run per 7.2 games—a little more than a third of home field advantage, which is roughly 2.9 runs per 7.2 games. It’s possible that the fan effect is even larger if its influence is felt more strongly on high-leverage calls, when crowds are louder than usual.

If fan-free games strip away that apparent advantage, Judge’s model will quickly pick up on and adjust to it. If that’s the case, Baseball Prospectus—which also publishes projections and playoff odds—will have to weigh whether to tweak or remove the hard-coded home field factor that guides some of its forecasts. Harry Pavlidis, BP’s R&D director, says, “We will watch and learn about home park advantage. … If we see reason to change it, we will.” (FiveThirtyEight has already reduced home field advantage by 60 percent in its projections for soccer games played in the absence of fans.)

Regardless of the outcome of this season, players, front-office officials, and sabermetricians alike will be reckoning with the aftereffects of 2020 for years and decades to come. “If you’re starting a website five years from now, you’re going to have to deal with the 2020 season and all the peculiarities that it has,” Forman says. The same goes for the authors and editors of the sabermetric Bible, the Baseball Prospectus annual, which published its 25th edition in January. The annual debuted in 1996, in the aftermath of the last shortened MLB season, and it will enter its second quarter-century following one of the smallest samples of all.

“It’s going to be hard to be as sure of ourselves in almost every aspect, which probably means less snark and more doubt,” say coeditors Craig Goldstein, Patrick Dubuque, and R.J. Anderson in a joint statement. The most recent edition of the book included comments on 2,171 players, but that count may slip somewhat next season. “It’s possible we’ll have to curtail the number of minor leaguers we address, given their seasons either won’t happen or will take place behind closed doors at secondary development facilities,” the editors say. “With so many players with so little to report on … but still in their team’s plans for next season, we’ll have to think of a new way to present that information to the reader.”

In other words, they’re going to get creative, just like the players, coaches, and executives caught up in the maelstrom of sports in the summer of COVID-19. “A lot of the variation in 2020 will be sui generis,” says Thorn. “But the times warrant bold experimentation.”