So the NL Cy Young award recipient was announced the other day, and surprisingly, it didn't go to the guy with the most wins, or the best ERA for that matter. Instead, it went to Tim Lincecum, who lead the league in FanGraphs WAR by a pretty big margin.
Naturally, given the two candidates that he narrowly beat, this sparked a lot of debate about whether or not FanGraphs' WAR was the best way to go. From what I gathered, a lot of people thought that timing should be taken into account, and in the case of Carpenter, "pitching to your defense". There were even some rogue mentions of wins and pitching to the score.
Unfortunately, I don't know the right stat to figure out the Cy Young award winner. That's because each person has a different interpretation of what it means to be the Cy Young. What I can do is lay out all of the major stats so that you can make the most informed decision about how you want the Cy Young to be decided.
FIP
As you may or may not have heard, FIP has had a lot of popularity around the blogs as a way to evaluate pitchers. The reason is likely for it's simplicity, which makes the stat easy to calculate and relatively void of "noise" (I'll expand on that later). FIP breaks down each at bat into 4 possible outcomes: a walk, a strikeout, a home run and a ball in play. FIP assumes that pitchers have 100% control over what happens after the first three (99% of the time, a walk, strikeout or home run results in that), and they they have 0% control about what happens after when a ball is put in play.
The league average hit rate for balls in play is .300, and FIP assumes that will be the case for all pitchers, and that is the controversy with FIP. It's clear that some pitchers have more control over batting average on balls in play (BABIP) than others. Guys who get a bunch of ground balls and pop ups will have a lower BABIP. However, BABIP is inlfuenced by luck and defense much more than actual skill, so a lot of people like to ignore it when evaluating pitchers.
FIP also takes timing out of the equation, because it uses aggregate stats. If one pitcher does much better with runners on base than another, but his overall stats are the same, FIP will rate the two equally. If I could describe FIP in one sentence, it would be... how many runs per 9 innings a pitcher would have given up given neutral timing, and assuming an average distribution of balls in play. That doesn't mean it's better or worse than any other stat. It is what it is. If you think that BABIP and timing are just luck (which they really are for the most part) and shouldn't be credited towards the pitcher, this is the stat for you.
xFIP
This stat is the same thing as FIP, expect it substitutes HR's for .11*FB's. The reason for this? Well, it's been shown that the rate of fly balls that go for home runs (HR/FB ratio), is largely out of a pitchers control. Like BABIP, there is some skill involved in HR/FB ratio; however, it is dwarfed by the amount of luck involved in it. So xFIP strips away all the skill and luck, and simply assumes that the pitcher has 0% control over HR/FB ratio.
tRA
This stat is like FIP on speed. In addition to considering walks, strikeouts and home runs - it also considers the quality of balls put in play, by using the 4 batted ball classifications: line drives, ground balls, outfield fly balls and pop ups. While on the surface, that seems like a good thing because it gives more credit to pitchers for their ability to control BABIP, it also adds a lot more noise due to the subjective nature of batted ball classifations.
Unlike strikeouts, walks and home runs, batted ball stats can vary based on the source of the data. There are 3 main sources of batted ball data: BIS, STATS and Gameday. The first two are available for pay, while Gameday is freely available to everyone. Each of those sources gets batted ball data from stringers, who are typically paid about 10 dollars per game to watch each batted ball and classify it. BIS and STATS used to have their stringers record the data at the ballparks; however, I believe that they all do it from the TV now.
Anyway, you can see how that would be a problem. A soft liner dropping just in front of Colby Rasmus could easily be classified as a fly ball or a line drive depending on the source. Given the huge difference in terms of value between a fly ball and a line drive in tRA, the batted ball classifications can have a big impact.
In fact, given that we have tRA avaiable on two websites, StatCorner and FanGraphs, that each use different batted ball sources, Gameday and BIS respectively, there are large discrepancies in the tRA numbers for the same pitcher. For example, her are the tRA's by Statcorner and FanGraphs for each of the 5 major Cy Young candidates in the NL this year:
StatCorner tRA | FanGraphs tRA | |
Tim Lincecum | 2.52 | 2.83 |
Javier Vazquez | 3.19 | 3.67 |
Dan Haren | 3.25 | 4.12 |
Chris Carpenter | 2.77 | 3.02 |
A.D.A.M. | 3.47 | 3.56 |
As you can see, the difference in tRA is huge for some pitchers just based on the source of the batted ball data. So while tRA adds a lot more usefull information about a pitcher, it also adds a lot of noise that can obfuscate the pitchers actual performance.
ERA
I think we all know about this one. This assumes that the pitcher has 100% control over everything that happens (except, for errors). While this is the final results of how effective the pitcher has been, it also includes way to much things that are out of a pitchers control, like defense, HR/FB ratio, BABIP and timing. So while FIP, xFIP and tRA perhaps takes out too many factors that the pitcher has some control over, ERA takes out way too few.
WAR
WAR is simply a way to combine production with innings pitched to get an estimate of how many wins a player contributes to an average team over a replacement level player. When I say 'production', mean the estimate of how good the pitcher was when he pitched. This can be estimated using any of the 4 metrics I outlined above: tRA, FIP, xFIP or ERA, which should be adjusted for park, and if you want, quality of batters faced. WAR turns that number into an expected W% using a PythagenPat run estimator, subtracts that by the expected W% of a replacement level pitcher, and multiplies that by innings per 9.
You can see more about how WAR is calculated here:
http://www.insidethebook.com/ee/index.php/site/comments/how_to_calculate_war/
For those who care, the closed form equation for figuring out WAR is:
WAR =((((B/A)^((A+B)^0.275))/(((B/A)^((A+B)^0.275))+1))-0.39)*C/9
Where:
A = the pitcher's run or earned run average using one of the estimates above
B = the leaugue run or earned run average
C = innings pitched
So when you hear people say WAR, don't assume that they are talking about the FanGraphs version of it. WAR can be calculated many different ways, using different park factors, run estimators, defensive adjustment and quality of opponent adjustments. In my opinion, a well calibrated WAR calculation is the best possible way to combine production and playing time to get an estimate of a players value above his potential replacement. The only thing it doesn't take into account is "pitching to the score". If you believe that certain pitchers have control over when they give up their runs, and pitch given the game context, you might not like WAR so much.
WPA
WPA has a pretty simple defition. It measures total improvement in the odds of winning a game for each player. By that I mean, it measures how each player contributes to an average teams odds of winning the game, and sums up the results for each player. So for example, based off of empirical data, you can see that with 2 outs in the bottom of the 9th, with a runner on first base and the home team down by a run, an average team will win roughly 8.7% of the time.
Albert Pujols' big badass self walks to the plate and promptly hits a 484 foot home run. Now, in the bottom of the 9th with 2 outs and the home team up by a run, the home team surprisingly wins the game 100% of the time. So Pujols' WPA will be .913, or he improved the odds of his team winning the game by 91.3%.
The concept of WPA sounds great, it rewards each event based on the value it has depending on the context of each game. Even if you don't believe that situational hitting is a skill, you have to admit that it has value. However, there are numerous problems with WPA when it comes to evaluating performance:
1) It doesn't include adjustments for park
2) It gives full credit to pitchers for the contributions of their defenders
3) It doesn't give credit to pitchers for innings pitched
That last point is often overlooked when talking about WPA. WPA is the change in win probability for an average team. So if you assume that each player is taking the spot of a replacement level player, you have to include that adjustment in WPA. So a guy who puts up a 5.00 WPA in 50 innings isn't necessarily more valuable than a guy who puts up a 2.00 WPA in 200 innings, because his replacement in those 150 innings will have a negative WPA.
Wins
These are like WPA, except it also gives the pitcher full credit for how many runs his team scores, how long his manager let's him stay in the game and how well the relievers that pitcher after him perform. Really, there is nothing that wins add to the discussion, and they add a whole bunch of factors that obscure the pitchers actual performance.
Summing it up
Hopefully, I've presented the pro's and con's of each of these stats in an easy to understand way. If you've followed along, you'll notice that each of these stats has a certain give and take to them. I can't stress this enough. The simplest stats like FIP and xFIP don't tell as much about a pitchers performance as stats like tRA and WPA, however, they also are a lot cleaner and don't tell you a lot about stuff that is out of the pitchers control. The more information you add, the more noise and superfluous and obfuscating data you add.
It's up to you to choose the best possible combination based on your preferences of how we should value a pitchers performance. What you can't do, is pick who you want to win the award first, and then choose the stat - that's just being a dick.