clock menu more-arrow no yes

Filed under:

Long Before WAR, Nobody Knew What MLB Players Were Worth

A brief history of how baseball minds misunderstood a player’s true value before the sport largely adopted the all-inclusive stat

Getty Images/Ringer illustration

As MLB’s compressed regular season comes to a close this weekend, selected BBWAA members will fill out awards ballots, as a subset of credentialed writers do every year. Although this ritual seems like a sign of normality, nothing about baseball in 2020 is entirely routine. This year’s BBWAA voters are about to experience an expected but still jarring repercussion of the pandemic-imposed 60-game schedule: Wins Above Replacement, which has strongly influenced MVP voting in recent seasons, will be less revealing than usual. With more players clustered close to the top of the leaderboard, the all-encompassing sabermetric value stat simply won’t offer much help in ranking MVP contenders.

“It’ll likely mean less to me this year,” says NL MVP voter C. Trent Rosecrans of The Athletic. “I usually use it, but kind of as one of many data points, and anything within one win [in 162 games] is about a wash.” At the end of last season, five other players finished within one win of the major league leader in FanGraphs WAR, Mike Trout. As of Thursday, 21 other players were within one win of the 2020 leader, José Ramírez.

Sportsnet’s Ben Nicholson-Smith, an AL MVP voter, echoes Rosecrans’s concern. “Typically I’d lean really heavily on WAR for MVP,” he says. “If a player didn’t have at least 4-5 WAR, it would require some pretty special circumstances to appear on my ballot. And the top of my ballots have typically correlated closely with the top of the WAR leaderboards. It’s not perfect, but I feel that’s the best measure of value we have. But this year I am really hesitant to lean too heavily on what the numbers say about 50 games of defensive work or a few baserunning plays, and that makes me a little more skeptical of WAR leaderboards. I’ll look, of course, but I won’t base my decisions on such small gaps between players.”

Given the vagaries of small-sample defensive stats and this season’s abnormally large variation in the quality of competition (because of the short, rearranged schedule), WAR alone won’t be able to tell us whether Superstar A has been better than Superstar B. But it can tell us something fundamental that fans in earlier eras never knew. Based on WAR and its underlying components, we can say with some certainty that the most a player can be worth in a span of 60 games is 5-6 wins more than a replacement-level player. We can also say that this season’s standouts have been worth about 3 WAR. Any given player with roughly 3 WAR might truly have been closer to 2 or 4. But we know that no player was worth 8 or 10.

The range of possible player value is an elementary truth about the sport that we take for granted today. Along with many other baseball fans, analysts, and media members, I’ve internalized the WAR scale. In a normal season, I tend to think in terms of “3-win players,” “6-win players,” or “10-win players,” understanding that while our estimates may be off for any individual, there’s almost certainly no such thing as a 20-win player or a minus-5-win player. (The highest and lowest single-season WAR values ever, according to FanGraphs, are Babe Ruth’s 15.0 in 1923 and Jim Levey’s minus-4.0 in 1933, respectively; Baseball-Reference is slightly more liberal.)

WAR and its forebears have helped us establish the lower and upper bounds of how valuable big leaguers can be. Before those stats existed, sabermetric godfather Bill James says, “We just really did not know, you know? Until we worked things out, we just had no concept of the scale of elements of the game.”

James was one of the people who helped work things out. Although holistic measures of offensive performance had existed far earlier, it wasn’t until the 1980s that the first measures of all-around value became publicly available. In his 1980 Baseball Abstract, James introduced a system called the Value Approximation Method, which he called “potentially the most powerful analytic weapon that the game of baseball has ever had at its disposal.” Forty years later, he backtracks a bit: “It had no value in modern terms, but it was a useful step forward for me personally at the time. It helped me think more clearly about the realistic scale of things.” That method made use of a quasi-replacement level, a concept that James developed throughout the ’80s.

In 1984, Pete Palmer published Total Player Rating in The Hidden Game of Baseball. TPR, which Palmer had first formulated privately in the late 1960s, expressed value in terms of wins but was based on a baseline of average rather than replacement level. Subsequent work by James and others—including Keith Woolner, Clay Davenport, and Tom Tango—codified and refined replacement level and yielded familiar metrics such as VORP, WARP, Win Shares and, finally, WAR, which went up at FanGraphs in 2008 and Baseball-Reference in 2010.

Even after early win-value stats steeped into the nascent sabermetric discourse, they remained far from the mainstream spotlight. “Neither TPR, nor VORP, nor anything else similar had any traction with players, front offices, or sportswriters, so they were never part of the conversation about player value,” says Woolner, the former Baseball Prospectus writer who helped calculate replacement level, invented VORP, and now serves as the principal data scientist in baseball analytics for the Cleveland Indians. “To my recollection, there was no overarching framework that anyone really used to try to assess total offensive and defensive contributions, or compare position players to pitchers. Instead, writers, etc. focused on the individual arguments in favor of one player or another.”

In that environment, it was easy to endlessly, fruitlessly argue about whether this weak-hitting great glove man was better than that defensively limited slugger, and so on, because there was no way to conclusively settle the matter. Maybe barroom debates about baseball were better before the advent of argument-ending numbers, but the absence of a standardized rubric led to far-fetched and frustratingly irrefutable pronouncements. “It used to be pretty common for an announcer to say that a fast runner would go from first to third on a single 150 times a year,” James says. “Once a game. I remember an announcer who was a huge admirer of Roger Maris saying [in the 1970s] that Maris saved the Yankees 50 runs a year by preventing runners from going first to third on a single.”

WAR tells us today that Maris topped out, all told, at approximately 7 wins (or 70 runs) in his best seasons. But if it sounded plausible to a well-informed pre-WAR observer that Maris had saved 50 runs a year solely by preventing runners from taking extra bases, then how many runs might it have seemed that Maris was worth as a hitter, a fly ball catcher, and everything else? Perhaps he’d contributed 20 wins or more. And in a world where inordinate tallies were perceived to be possible, it also seemed conceivable that a single star player could dramatically affect a team’s fortunes.

In the mid-1970s, James recalls, he set out to answer the question of what one star player was worth, prompted both by the beginning of free agency—which had caused lofty claims about players’ potential impacts to proliferate—and by the absence and diminished performance of Royals pitcher Steve Busby, who had been the team’s best pitcher in the two preceding seasons but ran into arm trouble in 1976. Kansas City sports editor Chuck Woodling had written that the Royals were doomed without Busby, but James researched similar losses and discovered that it was common for teams to lose their best pitcher and keep winning without him. (Sure enough, the Royals took the AL West title in ’76 and, without an inning from Busby, won 102 games in ’77.)

Then James decided to go deeper. He found 40 or 50 examples of teams suddenly gaining or losing a superlative player from one season to the next, whether because of an injury, a precocious rookie’s debut, a one-sided trade, or some other reason (such as Willie Mays rejoining the Giants in 1954 after his military service). Then he examined how many wins those teams tended to add or subtract. “The answer was, at the time, shockingly low,” he says. “Twisting the data as far as you could to make it look important, you might say that those teams dropped by 8 to 10 games when they lost a superstar, or gained 8 to 10 games when they suddenly gained a superstar. It wasn’t actually that much; 5 to 8 games, on average, was more realistic.”

James remembers reporting the results of his study in a publication called Baseball Bulletin in 1976 or 1977. “Nobody believed it was possible that the impact of a superstar was that low,” he says. “But I found that it only really makes a difference when you lose several players at the same time. If you lose three good players at the same time—which does happen—then that makes an impact. But losing or gaining one great player, teams mostly can cope.” Like everyone else, James was fumbling in the dark, but he had a dim match.

Today, the floodlight of WAR would tell us at a glance that even the best baseball player can’t transform a team from bad to good. (So would Mike Trout’s 15 career postseason plate appearances.) But decades ago, James says, “we had no alternative means of measuring that impact, by answering, ‘How many runs did this player create, compared to replacement level?’ and, ‘What is the Win Value of one run?’ I didn’t know the answers to those questions, so I could not support my primary answer with an alternative approach. It wasn’t until I could do that that I accepted myself that the impact of one player was not 20 games or 25 games or whatever we all previously thought that it was.”

Over the past few years, I’ve collected 10 examples of pre-WAR, wildly inflated estimates of what players (or other team personnel) were worth. I’ve tried to be selective, because some assertions that sound dubious actually (sort of) stand up to scrutiny. In August 2010, for instance, Brian McCann said his Atlanta teammate Brooks Conrad had “single-handedly won us five or six games.” Conrad’s 2010 WAR was barely above replacement level. But McCann was clearly talking about Conrad’s clutchness, and it’s true that Conrad’s positive win probability added was over 5 that year. (Of course, Conrad also helped lose some games for Atlanta, which McCann wasn’t counting.) Similarly, in 1978, Graig Nettles said, “Maybe I’m the only one that knows it, but I save more runs with my glove than I drive in.” Assuming Nettles wasn’t talking about RBIs, this turns out to be true, at least relative to the average third baseman: He finished with 140 fielding runs above average and 102 batting runs above average.

Below, I’ve listed the claims that did make my cut, ranked from most to least plausible.

Earl Weaver was worth 10-12 wins every year

The best baseball writer of all, Roger Angell—happy 100th, Roger—once wrote, “I always think Earl Weaver wins ten or a dozen extra games for his club every year on pure cogitation.” Weaver was a sabermetrically savvy skipper who was way ahead of his time, and he earned his Hall of Fame plaque. But if he was worth 10-12 wins per year—even compared to a replacement-level manager—his career value would be right around Babe Ruth’s. That seems excessive. For what it’s worth, attempts to discern Weaver’s value via statistical means have consistently rated him highly, but most appraisals of his annual impact have landed around two or three wins. However, Chris Jaffe, author of Evaluating Baseball’s Managers, estimated in 2010 that Weaver was worth 744 career runs—which would still only be about 40 percent of Angell’s estimate. This is one we can’t disprove, but now that we know that even the best player in baseball at any given time typically isn’t worth 10-12 wins a year, it’s hard to believe that Weaver was.

A good groundskeeper is worth a dozen wins

In 1983, Hall of Fame owner Bill Veeck told David Letterman, “A good groundskeeper is worth a dozen games,” which was more specific than his earlier observation that a groundskeeper can be “invaluable.” Roger Bossard—grandson of Emil Bossard, whom Veeck called “the Michelangelo of the groundskeepers”—put Veeck’s valuation at 10-12 wins and claimed that Veeck had said a good groundskeeper is the “10th man on the field.” Could the 10th man be more valuable than any of the other nine? In Veeck As in Wreck, the maverick made a persuasive case, admitting that his groundskeepers tailored the infield to suppress or augment bunting and accommodate each defender’s abilities, and also that they raised and sculpted the mound to enhance or neutralize each starter’s stuff. But unless those measures had huge effects, it’s tough to make this math work.

Phil Rizzuto’s defense lowered the Yankees’ team ERA by at least half a run

In Summer of ’49, David Halberstam wrote, “No one valued Rizzuto more than Yankees pitchers, off whose earned-run averages he was saving a half-run or more.” It’s not clear whose opinion this was—Halberstam’s, or Yankees pitchers’—but either way, it wasn’t true. Half a run per game translates to 77 runs in a 154-game season. Rizzuto had a good glove, but 20-ish runs was his ceiling.

Omar Moreno prevented 50 doubles and 20 triples a year

In the early 1980s, a Pirates scout told Dollar Sign on the Muscle author Kevin Kerrane, “Andre Dawson’s a better all-around player, but Moreno saves our ballclub 50 doubles and 20 triples a year that’d go by most fielders.” According to today’s stats, Dawson was the superior fielder at that point in his career, but the more glaring problem is one of magnitude: Based on the average run values of offensive events, taking away 50 doubles and 20 triples would be worth about 75 runs, or 7.5 wins.

Scouts still sometimes say the darndest things and get out over their skis when talking about talented players: In 2017, an international scouting director said Luis Robert was “the best player on the planet, and that’s no exaggeration,” and just last week, a scout said that Sixto Sánchez “may be the best pitcher in the game right now.” But the Moreno mistake was more common before WAR provided a less anecdotal look at player performance.

“There’s definitely an availability bias—you remember all the great plays, but tend to forget the ones he didn’t make, or the more routine plays that should have been made,” Woolner says. “If you add up only the good things a player does, then maybe you get to 15 wins’ worth of ‘good things’ … but the full reckoning would include all the outs made on offense, plays not made on defense, etc. … I don’t think there was any thought to try to assess plays consistently and rationally.” Even if the thought existed, the data didn’t.

Ozzie Smith (and others) saved 100 runs per season on defense

In September 1982, Smith’s Cardinals manager, Whitey Herzog, said that Smith had “saved 100 runs with his defense, and that’s as good as driving in 100 runs.” Far be it from me to criticize one of the best defenders ever, but based on various fielding metrics—Total Zone, Fielding Runs Above Average, Defensive Regression Analysis—Smith never saved more than 32 runs in a season with his glove. Unless Herzog was speaking about Smith’s run-prevention powers relative to playing without any shortstop at all, 100 runs is a serious stretch.

Yet Smith isn’t the only purported hundred-run fielder. In 1989, Cleveland manager Doc Edwards, who had managed White Sox outfielder Dave Gallagher in two Triple-A seasons, said that Gallagher “probably saved our team 100 runs in those two years.” (Gallagher grades out as one of 1989’s worst defensive outfielders.) In February 1999, Phillies GM Ed Wade said Rico Brogna had saved 100 runs on defense the previous season. (Total Zone says he saved 2.) In August 2001, Phillies manager Larry Bowa said Scott Rolen saved “75-100 runs a year with his glove.” (“I’m not much into math,” Rolen responded.) And in September 2006, Ozzie Guillén said Joe Crede had saved 100 runs that year. With all of these hundred-run preventers roaming around, you have to wonder how anyone ever scores.

Iván Rodríguez saved one run per game

Forget about preventing 100 runs—how about 162? Last year, writer Phil Rogers tweeted, “An AL manager once told me [Rodríguez] saved Texas one run per game.” Pudge was great, but he wasn’t a 20-win player.

In fairness to the anonymous AL manager, Rogers’s tweet continued, “Hyperbole? Probably.” Trailblazing analyst Craig R. Wright, the first front-office employee to earn the title “sabermetrician,” began working for the Rangers in 1981. Throughout his career, he observed a tendency toward hyperbole among his more traditional colleagues. “While they would sometimes talk in an unrealistic, exaggerated form to make a point, their actual decisions appeared to be guided by a more muted and realistic judgment, but still more ‘off’ than they would likely be today guided by the right metrics,” says Wright, who adds that the most common mistakes in player evaluation at that time were misappraising fielding and overrating offensive speed.

As Tango puts it, “The best thing to do is look at actual behavior. With money on the line, how do the decision-makers actually behave? How they behave is what they believe.” It’s unlikely that anybody believed Rodríguez was worth so much. But now, no one would even suggest it.

Herb Washington was worth 10 wins without making a plate appearance

When A’s owner Charlie Finley signed track star Herb Washington to be his team’s designated pinch runner in 1974, Oakland’s press release promised that Washington would be “directly responsible for winning 10 games this year.” Instead, Washington was caught 18 times in 47 steal attempts and incurred a costly pickoff in the World Series. He never appeared at the plate or in the field.

Yet in December of ’74, Reggie Jackson said he believed that Washington had won “nine games outright” for the A’s by stealing himself into scoring position, and manager Alvin Dark said Washington was worth more than that. Elsewhere, Dark elaborated that Washington had won nine games solely by “stealing a base as a pinch runner which eventually led to the lead or winning run,” which Washington didn’t even do nine times.

Not everyone was convinced: As A’s third baseman Sal Bando said, “Yeah, but how many games did he lose?” In May 1975, the A’s released Washington, a strange thing to do to a 10-win player.

A good third-base coach is worth 16-17 wins a year

Herzog, the future Hall of Fame manager, basically credited himself with almost 20 wins when he was coaching third base for the Mets in 1966, which was particularly bold considering that those Mets won 66 games. Herzog was firmly in favor of waving runners around. “When a base runner has a chance to score, you’ve got to remember that the percentage is with him,” he said. “It’s like being a gambler—you’ll force the other side to make either a perfect play or a damaging mistake.”

Baseball Prospectus writer Russell Carleton has expressed the same idea, facetiously suggesting that third-base coaches should be replaced by sticks that point to the left and say “RUN!” Carleton notes, “If you could get someone to really, truly embrace that philosophy, they would have an advantage over their fellow third-base coaches.” However, he adds, “I disagree on the 16 or 17 wins.” Among the 20 teams of 1966, Carleton reports, Mets runners ranked fifth in send rate and 12th in success rate on opportunities to try for home.

Luis Castillo was worth 15 wins a year

In December 2005, then-Twins manager Ron Gardenhire said of his team’s newly acquired second baseman, “He’s worth 15 wins, potentially. We lost 30 one-run games last year. With Luis’s ability to get on base, steal bases, score runs, and play defense, a guy like that can make a difference in at least half those one-run games going the other way.”

Fire Joe Morgan fisked this for me. Granted, part of a manager’s job is to talk up his players, and if Castillo happened to get all of his hits in close games and make all of his outs in blowouts, he would, in a way, be worth 15 wins. But that “potentially” in Gardenhire’s statement is doing an awful lot of work. Castillo, by the way, amassed about 2 WAR for the Twins in 2006. And as we later learned, he wasn’t always an asset in one-run games.

Roy Cullenbine’s fielding cost his team 15 wins

For every player who supposedly added 15 wins to his team’s total, there’s another player who supposedly took 15 wins away. In March 1948, Tigers GM Billy Evans said of recently departed—and often unfairly denigrated—first baseman Roy Cullenbine, “Someone, I think it was [manager] Steve O’Neill, estimated we lost around 15 games last season because of Cullenbine’s play around first.” O’Neill added that the other infielders were afraid to throw to Cullenbine, a former outfielder who had never before been a full-time first baseman.

According to Total Zone, Cullenbine was actually 8 runs better than the average first baseman in 1947. Regardless, we now know that a first baseman can’t cost a team 15 games on defense unless he literally can’t catch the ball. No one would ever estimate that today’s worst defender was anywhere close to that costly. In case you were wondering, the Tigers, freed from Cullenbine, fell from 85-69 in ’47 to 78-76 in ’48. New first baseman Sam Vico, who couldn’t hit, made the same number of errors (15) that Cullenbine had the year before, and the entire Tigers team repeated its error total (155) too.

Once you understand that a player needs to be about 10 runs better to turn a team loss into a win at the seasonal level, Wright explains, “you are essentially freed from slipping into silly exaggerations about how many runs/wins this player gave you or how many runs/wins this other player cost you.” It took baseball’s best minds more than a century to reach that point. We still don’t know what five-sixths of the matter in the universe is made of, so maybe it’s not so surprising that until recently, we hadn’t pinned down what baseball players were worth.

“I would have to say that I myself made errors as gross as the ones that you cited, in writing, before we really thought through the issue,” James says. “I’m not in position to mock anyone else; I said the same kinds of things.” WAR has made us smarter, but that education has come at the cost of our wildest (unfounded) dreams. Perhaps in some ways it was more fun to dwell in a world where the impossible still seemed believable. But in 2020, at least, a little more of baseball is back to being a black box.