I'm going to take on a subject that may not interest all of you but it's something that's been bothering me lately. A lot actually. So I'm taking the time to compile my thoughts on wWAR and why I dislike it so much.
What is wWAR?
wWAR stands for weighted WAR and it's a creation of the rejuvenated Beyond the Boxscore SBNation sabermetric site. (If you aren't visiting that site on a regular basis, you should be. They have the most aesthetically pleasing infographics I've seen for both sabermetric and non-sabermetric topics.) wWAR is the creation of Adam Darowski (adarowski) in an effort to better categorize and identify Hall of Fame players.
The basic premise is that players who had crazy awesome years should get extra credit for those crazy awesome years. Any player who generates WAR over 3.0 gets the amount of WAR over 3.0 double counted -- this is called Wins Above Excellence (WAE). Any player who generates WAR over 6.0 gets that double counted as well -- this is called Wins Above MVP (WAM). wWAR is no more complicated than WAR + WAE + WAM.
Example: If a player has a 8.9 WAR season (Jim Edmonds circa 2005):
- WAE = WAR - 3.0 = 8.9 - 3.0 = 5.9
- WAM = WAR - 6.0 = 8.9 - 6.0 = 2.9
- wWAR = WAR + WAE + WAM = 8.9 + 5.9 + 2.9 = 17.7
Essentially, wWAR double counts anything from 3.1-6.0 WAR and triple counts anything at or above 6.1 WAR. You'd total these numbers for players across their entire career and use this as a Hall of Fame comparison instead of just straight WAR perhaps.
Criticisms of wWAR
wWAR Ignores the Real World
The beautiful thing about WAR is that it's goal is to accurately reflect the world of baseball. There's a run to win conversion that's based on empirical data. wOBA is a set of linear weights based on empirical data. FIP is (arguably) the best reflection of what a pitcher actually contributed in a season. The components of WAR are the best practices of the moment that measure the play on the field. It's an attempt to quantify what our eyes see.
wWAR says that, at some point in the season, when a player breaches 3.0 WAR, we should consider their contributions more valuable. Doubly so, in fact. wWAR only exacerbates that claim by making every contribution beyond 6.0 WAR worth triple. Why would you do that? Has a HR really become three times as valuable in the WAR equation? This doesn't make sense to me at all as it deviates from our capturing the real contributions of players to the best of our abilities.
Peak Performance Is Unproven
One of the arguments that gets trotted out for the HoF each year is whether a player had a good peak. Voters talk about how that's important. In fact, Adam makes note of that himself:
While career value is nice (and I'm a compiler sympathizer, if that's a thing), voting trends tend to favor the guy with the "peak". Voters want a guy who was the best at his position for a certain period of time. Quiet consistency is boring. They want Ryne Sandberg (62.1 WAR) and not Lou Whitaker (69.7 WAR).
If we know anything, it's that voters for the Hall of Fame talk a lot. But are they holding true to this axiom? Adam's anecdotal remark aside, I've never seen a comprehensive analysis that shows players with better peaks are more likely to get into the hall of fame than players with comparable overall value but worse peaks. So we're making, what seems to me, a leap of faith here that voting patterns are intrinsically consistent with an argument that's unverified.
Let's set that aside for a moment and accept it as a premise. Does wWAR capture peak? If teams were really trying to capture peak performance, they'd try to match projected aging curves with player performance and get a group of players that peaked together. But wWAR captures ANY SEASON over 6.0 for triple value. Whether a player is 21, 27 or 40, they get credit for being over 6.0 WAR. An inconsistent production curve (relative to age) would be less valuable to a team if they were really trying to capture the peak years.
WAE Produces Some Mediocrity
Let's jump straight to a money quote:
I wonder if 3.0 is too low of a baseline, though. I mean, 2.0 WAR is generally considered league average. Shouldn't "excellence" start somewhere around 4.0 wins?
There's no real answer that follows the question but it's a critical one. What does "excellence" mean? (We'll get to the WAM threshold in a moment.) In 2004, Chris Carpenter pitched 182 innings with a 3.85 FIP. By WAE's measure, this counts as an "excellent" season (using fangraphs WAR). In 2008, Yadier Molina had 4.4 defensive runs and a .323 wOBA -- below the league average of .328 -- over 485 plate appearances but this would reach the "excellent" threshold.
WAM Further Illustrates the Arbitrary Nature of the wWAR Components
Those don't scream excellent to me but, more importantly, the way we're defining excellent is by grabbing numbers out of the air. The only way to exacerbate this is what happens with WAM:
so the average WAR for all MVP winners thoughout the history of the National League is 7.62. The average for all MVP winners in American League history also happens to be 7.62. My quick math tells me the average of the two of those is 7.62.
As that list shows, there are plenty of MVP winners that don't reach 7.62 WAR. Still, the fact that Willie Stargell won and MVP award with a 2.3 WAR season doesn't mean I should use that as my baseline either. So what baseline should I use for finding MVP-type seasons? I hesitate to use the average because, by definition, that means 50% of the players who did win the MVP award didn't even reach that level.
So, let's go back to my original baseline of 6.0 WAR. Of the 181 players who have won MVP awards in MLB history, 133 of them achieved a WAR of 6.0 or above. That's 73%. That, to me, sounds like a pretty good baseline.
The first issue to attack here is that, for years, MVP voting has been criticized as a poor approximate for the actual top performers in a league. Every awards season there are hundreds of blog posts about how player X got screwed or player Y should have been off a voter's ballot. So we're starting from a rather unscientific approach of ranking players to find a threshold for what an "MVP" is.
To make matters worse, the threshold that seams obvious (average MVP WAR) is casually discarded. Instead, we want 73% of MVP award winners to get in. Why 73%? Frankly, it's because we all like round numbers and that's what 6.0 generated. That's a terrible definition of a threshold in my opinion.
Alright Robot, What's Your Solution?
I don't have a concrete answer to that question. I applaud Adam for his ingenuity and legwork in compiling a stat to better tackle the Hall of Fame worthiness of players. I do have a couple of questions/suggestions that may lead smarter people to an improved answer.
Is Career WAR an Insufficient Tool?
Stay away from a historical approach to the Hall of Fame. Trying to justify current voting practices or what new voting SHOULD look like based on what historical voting ACTUALLY looked like is only useful if you think the previous voters got it right in the past. If we're talking about what voting should look like, there's got to be a way to come up with that independent of what it has looked like.
But the main thrust I want to make is that, I don't quite understand the inadequacy of career WAR for identifying players who belong in the HoF. If we're looking for the players who contributed the most on the field to their baseball teams, how is that not captured in career WAR? Someone needs to make a better case against career WAR before I can buy into the need for a wWAR (in any form it manifests).
Be Less Arbitrary
On some level, arbitrariness with regards to the Hall of Fame is impossible to avoid. But the process that created wWAR above rankles of whimsical and casual arbitrariness.
Personally, I'd rather see a rigorous debate around percentages. So, you think that 1% of the baseball population should comprise the Hall of Fame. Generate a cutoff point based on that and apply it going forward. Maybe it's 60 WAR. Maybe it's a different stat entirely.
Alternately, find a better way to set your WAE and WAM thresholds above. Describe excellence as the top 25% of the league in terms of WAR. You could use a moving average each year or simply apply it across eras if you want a static threshold. Describe WAM as the top 20 player WAR in a given year (current MVP ballots are 10 NL and 10 AL players) and determine your threshold that way.
Are these still arbitrary? Yes. But the point would be determine the methodology before the results rather than tailoring the methodology to generate the results you want.
Make a Better Case for Peak Performance
I think there's a really fascinating study to be done around peak performance. First of all, what is it? When evaluating aging curves what is the average performance in "peak" years relative to the full career? Is it 25% better? 30%? What years constitute those peak years as well? Can they be distributed or must they be clumped?
There's a logical case to be made about why we should care for peak performance. If a player has a better peak from age 23-28 should that be evaluated the same as someone who peaks during the aging curve predicted years as well? Are teams really valuing peak performance more? (The linear pay rate of free agents relative to performance would argue that they're not.) If teams aren't valuing that performance more, should Hall of Fame Voters?
You're the Least Helpful Robot Ever.
I know. It's a gift. Even after typing up those questions/suggestions, I realize that it's tilted far more toward the former than the latter. In reality, I've spent the last 90 minutes of writing picking apart someone's work. To be clear, I think the notion and impetus for the work is interesting but I'm not clear that we've fleshed out the need for it yet. And, if that argument can be cogently made, the process needs some touching up in my estimation.