Why do we use WAR?
No, I'm not going to do that stupid "What is WAR good for?!" crap and link to that goddamn video. WAR is good for evaluating players and it has nothing to do with stupid hippies. Sheesh.
Being serious now, WAR is a total value stat for a player. It attempts to measure how many wins a given player adds to a league average team above a "replacement level player". Wins, in this metric, are simply a constant division of runs (10 runs = 1 win, 20 runs = 2 wins, etc.) and don't preference certain runs over another. Given that WAR is a context neutral metric (runs in the 9th inning are just as valuable as runs in the 1st inning) that is fitting. The "replacement level" part of WAR, is simply the theoretical value of a typical player that you could find on the waiver wire or pull from your minor league system (think Joe Thurston). More on replacement level later.
WAR is simply the best stat you can use because it takes into account most quantifiable aspects of hitting/pitching, and converts that into the unit we all care about, wins. It allows us to compare players like Brendan Ryan with Adam Dunn, and solely look at players based on their value.
WAR is actually a pretty simple stat to calculate if you have access to the right inputs. The construction of WAR basically goes like this:
offense + defense + position + replacement level
I'll explain each of those things in detail.
My definition of offense includes everything that players do to help there team score runs. There are many aspects of offense of course, with the most important being hitting (or walking, you get the point). Hitting can be measured however you want really. The most common way to measure hitting is by using Linear Weights (don't ask me why that name was chosen), which, in it's simplicity, measures how many runs a player would add to an average team in a context neutral setting.
For example, a single, on average, leads to about .77 runs. However, that's simply the average number. A single with nobody on is far less valuable than a single with the bases loaded, and a single in front of a pitcher is far less valuable than a single in front of Albert Pujols. Given that hitters have absolutely no control of who is on base or waiting on deck when they hit their singles, we just assign the league average run value to them. You do that for each thing that a player does, and add it all together. That gives you that player's Linear Weights.
Another way to look at hitting value would be to include context. One such way to do that is to look at WPA (win probability added) which measures the change in win expectancy after a given event. For example, if Rasmus is up in the bottom of the 10th in a tie game with nobody on and nobody out, the Cardinals are expected to have roughly a 63% chance of winning the game. If Rasmus then hit's a walkoff bomb, the Cardinals have a 100% chance of winning the game. Rasmus' WPA for that play is then 37% of a win, or .37 wins. You do that for every play for every player and sum the results for each player, and that gives you each players' cumulative WPA.
The biggest problem with WPA, in terms of valuing players, is that it gives full credit to a players context around him, and that's almost completely out of his control. Whether or not you use WPA or Linear Weights or some other metric to measure offense, is a matter of your preference. I will say that Linear Weights is the most predictive and all encompassing. Linear Weights can be found on FanGraphs in a rate form as wOBA (which is simply linear weights over outs per plate appearances scaled to OBP). Unadjusted Linear Weights in the counting stat from can be found on FanGraphs as wRAA in the Advanced section.
The next part of offense is the adjustments you choose to make. Theoretically, one could make adjustments based on quality of pitchers, quality of ballpark, quality of opposing defense, etc. While none of the adjustments are going to be perfect, they are necessary to try to put everyone on an even playing field. I won't go into the technical details of adjustments, however, the point is that they are *not* perfect and are simply an approximation of how a players stats would change given average circumstances. There are also a lot of ways to handle them. They are also generally a net gain in understanding the value of a player - you should use them whenever you have the ability to.
The next step towards measuring offense is some form of baserunning metric. Linear Weights generally includes stolen bases and caught stealings, however, you might also want to add some measure of taking extra bases or what have you. Baseball Prospectus has a great stat called EqBRR, which attempts to measure such things. If you feel that baseunning should be a part of WAR, then EqBRR would be a great place to start.
As I've said, offense can be whatever you want it to be. However, it should be expressed in runs (or wins) above average and have some sort of park adjustments at the very least.
While offense is how a player helps his team score, defense is many runs a player helps his team save. Again, the values should be compared to runs above average.
Defense is a little more tricky to measure than offense because we don't have those nice little bins (singles, doubles, triples, homers, walks, etc.) that are unambiguously defined. Defensive valuation requires a lot of perception. There are quite a few metrics for defense that I can think of and they all attempt to measure how many runs a player would help his team save more or less than an average defender at that position. To name a few... UZR, PMR, +/-, Range, BZM, F**K, FRAA and others.
Most of these break up each batted ball into a certain bin based on it's estimated velocity, location vector and other things. It then estimates the league average out percentage of each batted ball in each of those bins and compares that to what the fielder actually did, then converts the difference to runs. So say that Colby Rasmus has caught 10 balls in 15 chances in bin 7 in 09 (shallow line drive, right center, etc. - numbers pulled out of ass). The league average rate is 7 out of 15, so Rasmus is +3. 3 plays is equivalent to about 2.4 runs. You do that for all bins, and sum the results and that gives you Rasmus' somethingZR.
What must stressed about these stats is that they are not an accurate representation of how valuable a defender actually was. They are simply an estimate, and can be prone to somewhat large discrepancies based solely on the source of the batted ball data or the differences in methodology. A great example of that is Andruw Jones, who, according to UZR, is either the best defender in the history of the game or about average depending on whether BIS or STATS provides the batted ball data.
The fact that defensive metrics have a lot of error in them allows room for subjective opinions to have value. If UZR says Pujols was an average defender last year, but most every scout and fan thinks he was excellent, it's likely he was better than UZR gives him credit for.
Another example is Franklin Guttierez last year, who was some +27 runs according to UZR. That's so ridiculously good that there is probably some error in that measurement, and for whatever reason UZR overrated him last year. It's more likely that he was really a +15 or +20 defender than +20.
Defense can be tricky, but I'd suggest that some combination of defensive stats, scouting information and regression to the mean could give you a pretty solid estimate of a players defensive value in a given year.
The positional adjustments are a very big part of measuring a player's value, and one that's unfortunately disregarded sometimes. A player who can be an average defender at shortstop is much more valuable than a player who can be an average defender at first base, simply because the former is much, much harder to find. Therefore, we use positional adjustments to try to put them on the same playing field.
Positional adjustments can be calculated a number of ways. A nice easy way to do so is to use an offensive baseline. Simply look at the 10 year average or something of all positions, find the value of the average line at each position in terms of runs or wins (this would be best using Linear Weights, but you could swing it with WPA if you want), and use that as your positional adjustment. For example, say that from 1995-2005, the average shortstop was -7 runs below average per 600 plate appearances and the average first baseman was +10 runs above average per 600 plate appearances (numbers pulled out of ass). You would add a prorated +7 runs to the offensive value of each shortstop, and a prorated -10 runs to the offensive value of each first baseman.
Another, and probably better, way to handle positional adjustments is to use defensive value. This better captures the fact that players are put at a certain position for their defensive ability and not their offensive ability; however, it is also harder to measure. There is a tradeoff. One way to look at defensive positional adjustments is by looking at how the average player's UZR or whatever changes if they change positions. This is actually a pretty solid method; however, it most likely contains some measurement error (although not a whole lot given the sample size) and some selection bias. The selection bias is key, as players will usually only switch positions for specific reasons that could systematically bias the results of the change in defensive value. Furthermore, it's harder to measure catcher value with defensive metrics.
For those reasons, it seems best to combine offensive positional adjustments with defensive ones, as well as some common sense. A good article on positional adjustments can be found here:
Replacement level is simply the value of the player you'd expect a given player to replace. So say you are the Cardinals and Troy Glaus, David Freese and Joe Mather all go down for the season. You bring up your very own 29 year old rookie to play 3rd everday. Surprise, surprise, he sucks. According to FanGraph's estimates, he was -9.8 runs below average with the bat, -.4 runs below average with the glove and 1.6 runs above average due to his positional value, all in 307 plate appearances. That comes out to -8.6 runs below average, and -16.8 runs per 600 plate appearances. If you do that calculation for all "replacement players", you get the expected value of a replacement level player.
So say you find that the average replacement level player is -20 runs below average per 600 plate appearances. You add a prorated to plate appearances +20 runs to each player to get his value above replacement. So a replacement level player is exactly 0 runs.
Like with positional adjustments, the replacement level adjustments are simply estimates and obviously vary by team and league. However, the concept is solid. Here are some more good articles on the matter:
Runs to wins
Since all of the units in the previously describe parts of WAR are in runs, we need to convert them to wins to get a better estimate of each players true value. The way to do this is by simply looking at how many runs generally equals a win.
On average, 10 runs = roughly 1 win. In other words, if you were to look at how many more wins each team got as a function of their run differential, it would average out to WAR = Runs*10. Of course this isn't always the case. For a team that, say, never gets out, 10 extra runs would be pretty meaningless to their win totals. For a team that never scores any runs, 10 runs would be huge. However, since hitters can't control their own run environment, we only consider the average situation in WAR.
For pitchers, it's a little different as they can control their own run environments. We'll get into that some other time.
Implementations of WAR
So we have this awesome concept of an uber-stat that takes into account nearly every single aspect of playing in a simple and functional way. Great, now we need someone to calculate this for all players by season. As you can imagine, this becomes a big chore. The values of each of the elements of value change over time, and become hard to calculate in themselves. Furthermore, there are some legitimate and valid disagreements on how best to calculate WAR. As far as I know, there are only two sources of publicly available WAR on the interwebz- at FanGraphs and Baseball Projection. I'll go through each of them showing the differences for hitters, and what they are lacking (and of course, what they do well).
FanGraphs (David Appelman)
FanGraphs uses Linear Weights for offense. I'm not exactly sure how these are calculated, but they should be pretty robust. They use no adjustments for league, umpires, or quality of pitchers faced; however, they do park adjust offense using 5 year regressed park factors from Patriot. Linear Weights are my personal favorite run metric for offense, so I have no problems there. I would like to see some adjustments made for quality of opposing pitchers at the very least; however, that's very difficult to implement and might not make a huge difference, so I understand why they don't do that. The park factors used are very solid; however, I think they would be better to at least split them up by batter handedness. Parks don't affect all hitters uniformly.
FanGraphs also doesn't include baserunning value (aside from SB/CS). While this isn't a huge flaw, it does have an effect on the value of players.
FanGraphs uses single season UZR for defense, with no extra adjustments (although UZR is already park adjusted). While this is certainly not a bad way to do it, I would be happier if they could weight other measurements of defense (including the Fans Scouting Report) to try to nuetralize measurement error. UZR is good, but not good enough to warrant taking it on it's face value. UZR also doesn't include catcher defense, so guys like Yadier Molina will be severely underrated using FanGraphs WAR.
For positional adjustments, FanGraphs uses Tom Tango's positional adjustments, shown here. Dave Cameron also has a good explanation of positional adjustments in that article. I have no problem with those, at least none that I can think to improve upon myself.
For replacement level FanGraphs uses 20 runs per 600 plate appearances. Other analysts have different values, but those are generally around the same level. Again, I have no qualms related to the replacement level adjustment at FanGraphs.
So for replacement and positional adjustments, FanGraphs does a pretty good job. However, for hitting and fielding, they have some noticeable flaws. That isn't meant to disparage the stat, it's still a very good metric, simply to show that FanGraphs' implementation is not God. Also, read Dave Cameron's articles on FanGraphs WAR:
Baseball Projection (Sean Smith)
For offense, BP uses custom team adjusted Linear Weights. That is, the value of each event is tailored to each team's run environment, so that the total Linear Weights of each player will add up to the total team runs scored. This is a + for value purposes, however, it also gives the hitter credit for things that are out of his control. It depends on your preference, if you'd rather use this Linear Weights or the ones on FanGraphs. BP also takes into account grounded into double play runs, reaching on errors, as well as baserunning. The baserunning adjustments are just estimates, and in later years are really just guesses; however, they should be fine to use. I'm not exactly sure how Sean handles park, league and pitcher adjustments, but I'm pretty sure he includes all three in some way shape or form. I'll wait for further notice on that.
For defense, BP uses Total Zone rating and a defensive estimate for catchers based off of wild pitches, stolen bases, etc. Again, there is the problem of using single season Zone Ratings, but perhaps that is just my own little pet peeve. UZR is better than Total Zone due to the better quality of batted ball data. For seasons before 1953, an adjusted range factor is used.
For positional adjustments, Sean calculates them seperately by decade. You can see how he does that here.
For replacement level adjustments, Sean uses his own values, which I think are generated by using his own CHONE projections. Again, I'll wait for more conformation.
Sean describes his system briefly here.
While FanGraphs and Baseball Projection each have great metrics, none of them are perfect obviously. I think it would be best, if looking to assess past value, if you individualized the way you calculated each component so that you can get what you are looking for. I'll go through an example using Pujols in 09.
Last year, Pujols had 69.7 non adjusted Linear Weights (including SB) in 700 plate appearances per FanGraphs. If you use Patriot's park factors, linked above, that translates to 72.1 runs above average. In a perfect world, I would use lefty/righty park adjustments and adjust by quality of pitchers' faced, but those numbers are really harder to come across.
For baserunning, I'll use EqBRR from Baseball Prospectus. Pujols was -.62 runs last year when you take out SB runs (because Linear Weights already includes them), so it's really just negligible in that case.
For defense, I like to use a combination of UZR, Total Zone and the fans scouting report. UZR has Pujols at +1.3 runs last year, Total Zone has Pujols at +12 runs and the fans scouting report has Pujols as the best first baseman in baseball last year. FSN converted to runs...
...has Pujols at +14.5. If you do this: (.4*UZR + .3+TZ + .3*FSN = my completely subjective weighting system), you get Pujols at +7.27 runs on defense last year.
For the positional adjustment, I'll just use the Tom Tango ones found on FanGraphs, which are -12.5 per 700 plate appearances. For replacement level adjustments I'll also use the ones on FanGraphs, so 23.3 runs.
You add it all together, and divide by 10 to get WAR. That gives us 9.0 WAR for Big Dog last year. It's worth noting that FanGraphs has Pujols at 8.4 WAR and Baseball Projection has him at 9.2 WAR, so there are some differences in the way you calculate results.
The biggest differences come for catchers or guys with a lot of value from their non stolen base baserunning. For instance, FanGraphs has CHONE Figgins at 6.1 WAR and Baseball Projection has him at 6.9 WAR.
I hope to have stressed that A) WAR is the best stat for evaluating players, and B) It is very complicated and there is no one set of ways to calculate it. Therefore, while you should use it always when comparing players and contracts, don't assume that WAR = actual WAR.
If you are walking down the street and someone says, "hey, did you know that Pujols was worth 9.2 WAR last year?", making sure to ask what park factors he used, if he used UZR or Total Zone, and how he calculated Linear Weights. Translate that into Baseball Blog nerd speak, if someone writes that Ben Zobrist was worth more than Joe Mauer last year, make sure to say that is only FanGraphs estimate of their respective values and you should dig deeper into the numbers to calculate your own WAR.
Here are some more reading on the subject of player valuation:
That should last you a couple of months, enjoy.
PS. Fuck Brad Penny. Also, I apologize for typos.