Not All Runs are Created Equal, Part 3

I have wanted to revisit this post series for quite some time now, but until recently I had no idea how I would do so. After re-reading the previous two "Not All Runs are Created Equal" posts, I realized I made a pretty glaring error in the formula for weighted run differential, serious to the point where the original formula no longer made any real mathematical sense.

The weighted run differential formula that I created in the first post, and later used in the second post, used a metric called strength of schedule (SOS), which is simply the number of runs per game better or worse a team's average opponent is compared to league average. Since using the SOS value alone as the run weight would often be way too extreme, I decided to moderate the weights by using an exponent between 0 and 1. The formula in question is listed below.

Weighted Runs Scored (SOS) = ((SOS + 1)^n)(RS)

Weighted Runs Allowed (SOS) = ((-SOS + 1)^n)(RA)

where SOS = the team's strength of schedule, n = some value between 0 and 1 that moderates the weight and maximizes weighted differential's correlation with team skill

The second post was dedicated to evaluating how effective the SOS-weighted run differential is at measuring team skill, as well as what exponent n would be the best at doing so. The data showed that an exponent of 0 (meaning both weights are exactly 1) would be best. In other words, the weights did not improve run differential's correlation with skill at all. Of course, the very idea of "measuring skill" is very vague and impossible to really do numerically, but putting that aside I was quite discouraged regarding weighted run differential. But it never occurred to me that the issues could be caused by the formula itself, not the famously nebulous question of just how skill should be measured.

Then a couple weeks ago, it hit me: SOS is measured in the difference in runs per game, so why wasn't I dividing by the total runs per game? That way, we can determine the percentage above or below average a team's opponents are and weight runs accordingly, making for a much more mathematically sound formula. In other words, runs scored and allowed should be weighted as follows:

Weighted Runs Scored (SOS) = ((SOS/lgRunsPerGame) + 1)(RS)

Weighted Runs Allowed (SOS) = ((-SOS/lgRunsPerGame) + 1)(RA)

where lgRunsPerGame = the average number of runs scored per team per game in the league

This also has environmental adjustment baked right in, since the league average runs per game is usually a pretty good proxy for how hitter or pitcher-friendly a league is at any given time. However, the keen-eyed among you may have noticed a problem with the formula above: it double counts SOS. You see, strength of schedule measures the number of runs per game better or worse a given team's opponents are compared to league average, as stated earlier. But the phrase "runs per game" includes both the offensive and defensive side. What this means is that SOS is the number of runs per game a team's opposing offense is better or worse than average plus the number of runs per game their opposing defense is better or worse, but it unfortunately does not distinguish between the two. A team could face offenses and defenses that are both 0.15 runs per game better than average, giving them an SOS of 0.3. However, that would be seen as identical to the team that faces offenses 0.3 runs per game above average yet defenses that are exactly average.

For now, we'll have to make the assumption that a team's SOS is always spread evenly across both categories. Obviously there are times when this is not the case, but I doubt this assumption would have a detrimental effect on the overwhelming majority of run weights. Remember, a lot of schedule luck evens out over the course of a full season anyway. With that in mind, we arrive at a new weighted run formula:

Weighted Runs Scored (SOS) = ((SOS/(2*lgRunsPerGame)) + 1)(RS)

Weighted Runs Allowed (SOS) = ((-SOS/(2*lgRunsPerGame)) + 1)(RA)

Given that weighted runs are just supposed to be runs in a schedule-neutral environment, we can use these values in the formula for Pythagorean win-loss percentage to create what I'll call an opponent-neutralized Pythagorean win-loss percentage (try saying that ten times fast). Since an exponent of 1.81 was found to be more optimal by Baseball Reference, I will use that instead of the original exponent of 2:

Opponent-Neutralized PythW-L% = (wRS)^1.81/((wRS)^1.81 + (wRA)^1.81)

where wRS = weighted runs scored, wRA = weighted runs allowed

For those who are curious, I decided to calculate the opponent-neutralized Pythagorean records for all teams as of July 26. The results, as well as each team's win-loss record and unaltered Pythagorean record, are below.

Team Opponent-Neutralized Pythagorean Records (2022 season through July 26)

Team	Win-Loss	PythW-L	OppAdjPythW-L
Yankees	66-32	69-29	70-28
Dodgers	64-32	66-30	66-33
Astros	64-34	61-37	62-36
Mets	60-37	58-39	58-39
Blue Jays	54-43	55-42	57-40
Mariners	53-45	51-46	54-44
Braves	59-40	57-42	54-45
Rays	52-45	51-46	53-44
Phillies	50-47	53-44	53-44
Cardinals	51-47	55-43	53-45
Twins	52-45	53-44	52-45
Orioles	49-48	48-49	51-47
Padres	55-44	52-47	51-48
Red Sox	49-49	48-50	51-48
Rangers	43-53	49-47	49-47
Giants	48-49	52-46	50-47
Guardians	49-47	49-47	48-48
White Sox	49-48	47-50	49-48
Angels	49-47	49-47	48-49
Brewers	54-44	52-46	48-50
Marlins	46-51	46-51	45-52
D-backs	44-53	46-51	45-52
Cubs	40-57	42-55	41-56
Rockies	44-54	43-55	41-57
Reds	37-59	39-57	39-57
Royals	39-58	38-59	39-58
Tigers	39-59	35-63	36-62
Nationals	34-65	35-64	35-64
Pirates	40-58	34-64	32-66

(Win-loss, run, and schedule data: Baseball-Reference.com)

Opponent-neutralized (or opponent-adjusted like the table says, really either name works) Pythagorean win-loss record is much like, though not identical to, a metric called Simple Rating System (SRS) on Baseball Reference. A team's SRS is simply its run differential per game plus its SOS, meaning a value of 0 would be average. As similar as these two metrics are, I cross-checked the table above with the SRS leaderboard and the match was not a perfect one. However, this could be due to the fact that I could not extract any more nuance out of SOS than "runs better or worse per game" (i.e. how much of that difference is from opposing pitchers vs. hitters), as well as the fact that Baseball Reference refuses to publish SOS figures beyond a single decimal point for whatever reason. If these issues were both resolved, it's very well possible the leaderboards for opponent-neutralized Pythagorean record and SRS would be consistently identical.

In that case, what's the point of ranking teams by a stat based on run differential with schedule difficulty accounted for when such a stat already exists? Well, a couple reasons. One is that weighted runs allow for more nuance than just SRS. For example, offenses and defenses can be compared in an opponent-neutral environment using weighted runs scored and allowed, which the all-encompassing SRS cannot do. The other is that opponent-neutralized Pythagorean record, based on weighted run differential, is simply more digestible than the alternative. Every baseball fan knows what a win-loss record looks like, and where most good and bad teams fall in that column, making it a very friendly and familiar benchmark to use. In the minds of fans, an average team should be .500, not 0.

Then again, the SOS version of this formula is not ideal. As stated earlier, it combines opponent skill into one number when two would be preferable. Ideally, we can use the average number of runs allowed per game by a team's opponents as well as their average number of runs scored per game. In other words, the ideal weighted runs formula should look like this:

Weighted Runs Scored = (lgRunsPerGame/oppRunsAllowedPerGame)(RS)

Weighted Runs Allowed = (lgRunsPerGame/oppRunsScoredPerGame)(RA)

where oppRunsAllowedPerGame = the average number of runs allowed per game by a team's opponents, oppRunsScoredPerGame = the same but for runs scored

Both of the above opponent run figures would be park-adjusted. The good news is that one of these metrics already exists as it turns out. Baseball Reference's pitcher value leaderboards include a category called "RA9opp," which measures a pitcher's average opponent's scoring rate per nine innings adjusted to a league-average context. Baseball Reference also tracks it for teams, not just pitchers, making it useful for this context. The downside would be that a nine-inning span is not always the same as a game, but we'll have to make do with what we've got.

Of course, both of the opponent run values can be found through rote computation. But that will be for a later time. For now, though, I'm glad I revisited this old topic and got to clear some things up. This is the first time in Baseball Analytica history that three posts have come out in three weeks, and plenty more content is on the way. So stay tuned.