Last week I read a prediction that made me think. It appeared in an NLDS preview, at the bottom of a breakdown of the best-of-five Dodgers-Nationals series.
The Nationals’ rotation and their middle-of-the-lineup stars give them a puncher’s chance in this series, certainly better than the 35 percent one the markets give them. These will be competitive games, but they won’t lead to a long series. Dodgers in three.
The prediction turned out to be wrong, but I didn’t know that then. Nor does it matter much now; this is about the process, not the results. I didn’t disagree with the first sentence, but I had some notes on the next two. According to the prediction, the Nationals “certainly” had a better than 35 percent chance to win the series, so let’s call that a 40 percent chance to take three out of five. Yet the prediction called for the Dodgers to sweep—that is, win three games before the Nationals could win one. How could the Nats have a 40 percent chance to win three out of five, yet also have a higher than 50 percent chance to win zero out of three?
This doesn’t make much sense. Statistically speaking, a sweep would have been the least likely way for the Dodgers to win—within a percentage point, actually, of the odds of the underdog Nats winning in five. Yet fans and media members make that type of prediction all the time. If we think one team is clearly better than another, we will often predict a sweep, whether in a best-of-five series or a best-of-seven. It’s simple and satisfying, and it marks a prognosticator as a person with the courage of their convictions. But in baseball, you should never predict a postseason sweep—if, of course, you care about being right or maximizing your rate of correct calls (we’ll get to that). Even more counterintuitively, you shouldn’t predict that a series will last its full length.
During the regular season, great teams sometimes play terrible teams, and the gaps can get so big that teams would be likely to sweep a short series. The chart below shows the distribution of FanGraphs win probabilities—which compare the projected talent of each team’s lineup and starting pitcher and adjust for home-field advantage—for the favorites in every regular-season game from 2014 to 2019 (as far back as the FanGraphs game odds go). The highest regular-season win probability on record is 84.2 percent, for the Indians on September 30, 2017, when they started Corey Kluber in Cleveland against the White Sox and Carson Fulmer. (The White Sox beat the odds and eked out a 2-1 win.)
In the postseason, though, teams’ skill levels are similar. (In that respect, MLB differs from the NBA.) No team since 2014 has surpassed the 71.3 percent win probability that the Dodgers earned in last year’s NLDS Game 2, with Clayton Kershaw on the mound at Dodger Stadium and the visiting Braves countering with Aníbal Sánchez, who had been bad for a few years before that season, which dragged down his projection. (Clutch Kershaw showed up, and the Dodgers won 3-0.) The next-highest win probability from a playoff game in that time is only 68.5 percent. As expected, the corresponding chart for postseason games over the same span shows a much more compressed range.
Because postseason matchups are so much less lopsided than the most extreme pairings during the regular season, sweeps should be harder to come by. “In real-life [postseason] baseball, the teams are more evenly matched and so a three-game sweep won’t be the most likely outcome,” says Jim Albert, a professor of statistics at Bowling Green State University who has authored several books about baseball and stats, including Curve Ball: Baseball, Statistics, and the Role of Chance in the Game and Teaching Statistics Using Baseball.
We can see how expected series lengths vary based on the mismatch between teams using this spreadsheet provided by former Baseball Prospectus writer Zachary Levine, which you can download and edit. For a sweep to be the most likely outcome, a team would have to be at least a 69.7 percent favorite to win every game in a five-game series or a 75.6 percent favorite to win every game in a seven-game series. For reference, 69.7 percent is roughly equivalent to the A’s-Royals matchup from September 16 (or the Twins–White Sox showdown a day later), and 75.6 percent is roughly equivalent to the Astros-Mariners matchup from September 7, in which Justin Verlander faced off against Yusei Kikuchi. That kind of showdown doesn’t happen in October. Only one playoff game—let alone a whole series—has cleared the former bar since 2014, and none has approached the latter.
Given the average win probability for the favored team in playoff games from 2014 to 2019 (56.3 percent), here are the likelihoods of each type of series going a given number of games. In both five-game series and seven-game series, sweeps are expected to bring up the rear.
Average Likelihood of Postseason Series Length (Best of Five)
Average Likelihood of Postseason Series Length (Best of Seven)
Along similar lines, these are the odds of an average favored team winning in each number of games:
Likelihood of Favorite Winning in Each Number of Games (Best of Five)
Likelihood of Favorite Winning in Each Number of Games (Best of Seven)
You may also have noticed that in both cases, the series is slightly more likely to last for the second-most number of games than the most—four instead of five, and six instead of seven. I used to think that when a matchup seemed really close, it made the most sense to predict that it would take the longest time to settle it. If you predict that a team will win in seven games, though, you’re essentially saying either that it will be trailing in the series heading into Game 6—which would mean that you’re predicting it will lose three of the first five games—or that it will be winning going into Game 6 but then lose Game 6, which doesn’t seem consistent with your belief that the team is superior. If you think a series is a true toss-up, it’s no more likely to go seven games than six. And if you think one team is at least a tiny bit better, that team is more likely to win in six than seven. The same concept applies to five-game sets.
So, if your only interest when making predictions is accuracy, avoid calling for sweeps, and avoid predicting that a series will last the maximum number of games. For most matchups, a four-game division series and a six-game championship series will be your best bets, even though five and seven, respectively, are far more fun.
Naturally, real life is messier than the theoretical model, but even modeling the real-life home-away-home schedule, rather than assigning the favorite identical odds in each game, doesn’t move the numbers much or affect the conclusions. The actual breakdown of best-of-seven series in the wild-card era isn’t far from what we’d expect: 12 sweeps, 20 five-gamers, 22 six-gamers, and 18 seven-gamers. For best-of-five series, though, we see that sweeps have a slight lead, 35-33-32. It’s possible that the model is missing something, but according to Albert, this difference from the expected number of sweeps isn’t statistically significant and doesn’t change the expectation that there will be fewer sweeps in the future (unless the Yankees keep playing the Twins).
Some of you may call me a coward for peddling my weak, wishy-washy takes about division series teams winning in four or championship series teams winning in six. I know this because my colleague Michael Baumann called me a coward as soon as he heard what story I was working on. He’s not wrong. For many people, the point of predictions—or, at least, predictions about something as insignificant as the outcome of MLB’s postseason series, especially if no cash is at stake—isn’t to be the most right over time, but to be the most memorably right in isolated instances. No one’s going to give you credit for predicting Astros or Yankees in six, even if you nail it. But if you predict that one of those teams will sweep, and you hit on that hard 17, it’ll seem like you saw something and were fearless enough to say so. No one remembers your misses. Playoff prediction fortune favors the bold.
The sweep prediction, then, is a handy way to tell who’s aiming for maximum accuracy and who’s aiming for maximum entertainment. I try to avoid predicting the postseason. But if I’m forced to forecast, I’ll stick to my no-sweeps stance. It’s a take so cold that maybe much like an ice burn, it will come all the way back around and start to seem hot again.
If you see someone calling for a 3-0 or 4-0 result, you now know what to do. Baseball’s playoffs are unpredictable. But beware of people promising sweeps.