Consistency Factor
Hey everyone, I'm a regular poster at the Mets SB Nation site, Amazin Avenue. I put this little case study together and thought you guys might be interested since there's some pretty interesting NL Central info in here in addition to the Oliver Perez stuff. Enjoy.

In honor of Oliver Perez week, I figured I would try to wrap up this case study on consistency. For anybody who missed it, I took a shot at quantifying start to start consistency, or what I'm calling Consistency Factor (CONF), for pitchers a few weeks back in this post. It was enlightening but the results just didn't seem quite right. So I've made some tweaks, basically utilizing Bill James Gamescore to evaluate each game started instead of WPA.
A brief refresher for those who missed the original post or forgot how it worked: Basically I evaluated a pitcher's starts individually (using Gamescore) and then took a Standard Deviation of those 30-something starts. The higher that #, the higher the range a pitcher regularly pitches within thus the lower his consistency. So lower = better. However, this # does NOT relate to how effective a pitcher is. A pitcher can have the worst CONF but still be very good, all it means is that he doesn't pitch consistently from start to start.
I compiled the CONF #'s from 2008 as well as the last 3 seasons totaled for every qualifying NL starter. However, before we dive into all of those #'s, here's a nice visual breakdown of this idea using, who else, Oliver Perez:
So as you can see, each point represents a start measured in Gamescore (high=good low=bad), the gray area represents 1 Standard Deviation away (in both directions) from Oliver's Mean (which happened to be a Gamescore of 51.65). In english, the blue points are each start, the gray represents the average range where he usually pitches and the red baseline is the level of his average start in 2008.
Now let's take a look at another pitcher for some perspective. How about Tim Lincecum:

First of all, Lincecum's Mean or average start is obviously higher (Gamescore: 62.06), the guy did win the Cy Young. (For reference, a Gamescore of about 50 represents Replacement Level) More importantly for us, Lincecum's starts are packed much more densely within his average range than Oliver's. Only a handful of starts fall outside of the gray area whereas Perez has many more fall outside. As a result, Lincecum's Average Range is smaller, which represents less variation and higher consistency.
As far as the most important figure that we can draw from these graphs, its that Standard Deviation I mentioned that makes up the gray Average Range. This is the figure that represents Consistency Factor. For Lincecum it's 13.18. For Perez, it's 17.08. The league average is around 16.

So now that we can kind of visualize where these #'s come from, lets look at the results. I grouped all NL starters (who pitched at least half a season in '08) by team and I've got all the teams here so lets look at them by division, East first of course:
Well theres a few interesting things we can see here:
- We already knew that Johan was awesome, now we know hes consistently awesome
- Apparently Maine & Redding are very consistent as well, although Perez, Livan & Pelfrey are not
- However, Oliver Perez: not that inconsistent...at least not as much as many of us would think
- Brett Myers on the other hand...ugh
- However, Cole's about average and guy's like Moyer & Blanton are really only valuable because of their consistency
- And damn, we all know Josh Johnson is going to be great but hes already pretty damn good (though to be fair this was based on only half a season of starts)
- Another surprise, Daniel Cabrera wasn't bad at all in '08
Onto the NL Central:
- The Cubs have a remarkably consistent staff...
- ...except for Carlos Zambrano who is officially THE least consistent starter in the NL
- Surprisingly consistent seasons from the 2 young Reds rookies especially Volquez, not so much from their 2 vets though
- As usual Paul Maholm is quietly very good
- And a shockingly consistent season by Todd Wellemeyer and that came in only his first season in the rotation
And finally the NL West:
- 2 more young guys with surprisingly consistent seasons in LA, not so much for Hiroki Kuroda
- What's the only difference between the Giants rotation and a roulette wheel? Tim Lincecum
- SI product Jason Marquis, very nice
- Surprisingly high from Webb, even more from Garland who is known for his consistency
- And a perfect example of what I said earlier about consistency not always equaling performance, Kevin Correia the NL's most consistent crappy pitcher (Gamescore Mean: 43.63...ugh); looking at these #'s, it's amazing he is still in a Major League rotation today
And last but not least, here are the rankings of the best and worst performers of 2008:
- Redding on the Top 5 (3 Year) list, I didn't expect that
- And John Garland on the Bottom 5 (3 Year) list, I bet people would have had him pegged as super consistent not the other way around...
- Another supposed "Mr. Consistency", Derek Lowe finds himself on the Bottom 5 (3 Year) list
- And yes, Oliver would have been on that list too but I gave him a little home-cooking if you will by excluding that bizarre stretch in '06; basically I'm just evaluating him in his time with the Mets
So I hope this has been enlightening. At the very least I figured it would break up the interminable dross that we've witnessed in the FanPosts recently. Just as last time, I apologize for the high level mathematics and such but thats the breaks.
I set out to disprove the myth of Oliver Perez as "Mr. Inconsistency" and I think I've done that. That title definitely goes to Carlos Zambrano. Perez is definitely on the higher end but even if I included all of the #'s, he's not as bad as the media portrays. I think a lot of that mindset derives from how he can be so inconsistent in-game or inning to inning rather than game to game, which definitely is no myth. That would be another interesting case study but thats another story for another day. As far as what this means to his overall performance, not too much because he's obviously been terrible this year anyway. But at least he's been consistently terrible...
18 comments
|
8 recs |
Do you like this story?
Comments
This is excellent, Rec'd
However, I’m not sure that Game Score is the best metric to use, mainly because it isn’t fielding independent, thus doesn’t really measure the “true” performance of a pitcher. A few year’s ago, Kevin Harlow modified Bill James’ Game Score to be fielding independent and also to support neutral win probability. The formula is listed on the bottom of the link, and it is actually a simpler formula than what is used for Game Score. If you’re results are all in a spreadsheet, than you could substitute Harlow’s version of Game Score; if not, than this is still a very good measure.
St. Louis Cardinals... defying win expectancy since 2008
by vivaelpujols on May 3, 2009 1:56 PM EDT reply actions 1 recs
thanks
and you’re not the first person to mention that DIPS-based stat to me. its definitely very interesting. i’m going to play around with it and may use it for my upcoming AL version.
by Rob Castellano on May 5, 2009 3:36 PM EDT up reply actions
is there any evidence
that Harlow’s mod is better than James’ version?
- "I went at it and didn’t slow down, so it kind of bounced off me." -Lil' Dunc
Is there any evidence of how good James' version is?
Future Redbirds - tracking Cardinal prospects for Cardinal Nation
Read his article and thought process
and it should become clear to you.
St. Louis Cardinals... defying win expectancy since 2008
by vivaelpujols on May 6, 2009 11:45 AM EDT up reply actions
oh, I know what he's doing
I just fail to see why it does a better job of measuring what Game Score is designed to measure.
- "I went at it and didn’t slow down, so it kind of bounced off me." -Lil' Dunc
Well Harlows seems less arbitrary than Bill James' Game Score
St. Louis Cardinals... defying win expectancy since 2008
by vivaelpujols on May 9, 2009 12:21 AM EDT up reply actions
Awesome Read
Some interesting numbers there too.
by from First to Third on May 5, 2009 4:42 PM EDT reply actions
Wait a minute here...
hoe did Joe Morgan’s fanpost get into rec’ territory?
And this was really a great post “rob”
* sarcasm might be involved in this comment
quality and consistency and time
I’ll restate the obvious and what the OP already stated: consistency does not equal quality (e.g., Jason Marquis). It also would be interesting to break out the time domain a little further; for example, Lincecum’s monthly, pre/post AS break, or 5 start rolling average.
SD
Interesting and insightful. I would suggest one change though – you are using standard deviation, and one limitation of that is that the standard deviation correlates with the mean. This means that the better pitchers (with higher average game scores) will inherently have a higher absolute standard deviation – just because the numbers are bigger. You might consider trying coefficient of variation instead – which scales the SD based on the mean – it is simply calculated:
CV = SD/Ave
or
CV = 100*(SD/Ave) if you wish to express it as a percentage
If you do so, I am sure Zambrono would no longer be the most inconsistent. As a cardinals fan I don’t know how I feel about that – always nice to see a cub at the top of the ‘worst of’ list……
Marquis' 3 Year Average
That’s the Bi-Polar Betty we know & love!
Don't argue with stupid people. They will drag you down to their level and beat you with experience. - anon.
by construction, aren't 66.7% of the games within the gray zone, no matter what?
Now, the gray zone can be a lot smaller, and the outliers can be farther outside of the gray zone, but…
They say that it's never too late, but you don't get any younger...
that would seem to be the case
St. Louis Cardinals... defying win expectancy since 2008
by vivaelpujols on May 10, 2009 1:22 AM EDT up reply actions
yes...
… but this study is about the size of the gray zone, not where the starts are within it. No need to call it a Consistency Factor, really. it’s really just a one-standard deviation confidence interval. in fact, we’d learn just as much but just looking at the standard deviations directly, and it would be easier to compare.
i’d be interested in how much leverage the outliers have here. in relatively small samples, high-leverage outliers can skew things quite a bit.

by 






































