The Past, Present, and Future of Sabermetrics
by A. Kline
Posted on 05/21/2021
Much like the sport itself, the first baseball statistics were heavily influenced by a similar popular sport across the pond: cricket. Henry Chadwick, sometimes called the “Father of Baseball” for his written contributions to the game, is credited with creating the first box score in 1859. Further cementing his legacy, Chadwick is also credited with creating the statistics now known as batting average and earned run average, arguably still the most popular statistics for hitters and pitchers, respectively.
Of course, the box score and statistics like batting average are nowhere near as comprehensive as the ones baseball statisticians today are used to, but they were not nothing. Imagine yourself watching a baseball game in 1858, before the invention of the box score, for instance. You might personally remember notable plays, but the records do not. Beyond maybe a final score, there was likely little (if anything) recorded about the game as a whole–let alone individual player performances. But with the box score, one could easily access troves of game data, season data, player data, and team data, allowing one to better differentiate good teams from bad teams and good players from great players–a massive step towards where we are today. Luck is an integral part of baseball, and will never not be. Just as the wind can blow a foul ball fair, it can blow a would-be home run right into a center fielder’s glove. But over the past century, we’ve gotten so much better at isolating it. Obviously we will never get 100% there–but it is this never-ending struggle to differentiate skill and luck that has always driven sabermetrics forward.
If there is one major difference between sabermetricians now and sabermetricians not much longer than a decade ago, it would be their attitude towards so-called “conventional wisdom” of baseball. A famous scene in the movie adaptation of Moneyball depicts A’s GM Billy Beane in a room with a bunch of scouts focused on, among other things, how pretty a player’s swing looks, while Beane is rightly focusing on whether said player can get on base. This is obviously an exaggerated example, but it does provide insight into the very real ways baseball players had been analyzed for decades.
If a hitter could hit the ball far, they were a good hitter. If a pitcher could throw for an entire game and earn a win, they were a good pitcher. The statistical analysis was only as in-depth as having plenty of wins to be a good pitcher, very few errors to be a good fielder, and a high average and lots of runs batted in to be a good hitter (While home runs were still seen as important, a .280 hitter with 25 home runs would almost always be preferred over, say, a .250 hitter with 40 home runs). Counting stats were still the standard for measuring a player’s skill, with rate stats never being more complicated than multiplying the number of earned runs per innings pitched by nine.
Over time, however, the shine of this rudimentary player analysis had been rubbed off. In 1954, Branch Rickey suggested that combining on-base percentage with what he called “extra base power” would be better at analyzing and comparing power hitters than counting stats like home runs and RBI. This newfound emphasis on distinguishing between the different values of hits, combined with the continued emphasis of the importance of getting on base, paved the way for a Kansas security guard named Bill James to drag sabermetrics into the modern day more than anyone else since Chadwick. Among James’ many contributions was his Run Created formula, which helped give a better idea as to how much a player contributed to their team, and his Pythagorean Win-Loss percentage, which underscored the importance of runs altogether. As the importance of getting on base was drilled into the head of hitters, the importance of preventing baserunners was drilled into the head of pitchers, leading to the widespread adoption of Daniel Okrent’s walks plus hits per innings pitched. Over the past two decades, getting a more accurate picture of a pitcher's skills has also been possible through the rise of Voros McCraken’s defense independent pitching statistics, and the impacts of different plays have been more accurately compared thanks to Tom Tango’s contribution to linear weights in formulas like weighted on base average (wOBA).
Since the mid-2010s, however, something interesting has been happening. Through the rise of tools like Statcast and Baseball Savant, values like exit velocity and pitch break have come under more scrutiny than ever, as sabermetricians continue to try and separate luck from skill by removing even more external factors from each play. In other words, modern day baseball statisticians are not unlike old-time scouts when they ask how hard a batter can hit or how good a pitcher’s slider is. In all likelihood, by 2025, sabermetricians will be adjusting exit velocity based on game time temperature down to the hundredth of a degree Fahrenheit, and ESPN will be putting up a hitter’s wOBA instead of their OPS when they come up to the plate.
However, sabermetrics has not been equally felt by all parts of the game. While aspects like hitting and pitching have been sabermetricized half to death at this point, aspects like fielding and baserunning, with limited exception, have been left behind. The extreme emphasis on individual performance has also directed attention away from how teams themselves compare to others well beyond their win-loss record. While progress in the field of sabermetrics may be asymptotic, such that it slows down the closer it gets to a 100% accurate representation of skill without ever getting there, tremendous untapped potential for analysis still exists in other facets of the game beyond hitting and pitching (and there is not an insignificant amount of analysis that can still be done in the realms of hitting and pitching). It remains to be seen what innovations in sabermetrics will be the topic of tomorrow, but it is easy to fall into the mistaken belief that we are currently as advanced as we ever will be.