Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Knicks Beat Lakers With Familiar Strategy

Stats to use in VEB: BATTLEDOME!

After the brilliant thread about ecks posted by larry, and another great NL MVP diary posted at the red reporter, I thought it'd be nice to have a stats deathmatch.  

Baseball is probably the most misunderstood sport of all time.  Just walk on over to the PD forums and take a gander, go to the local sports bar, or listen to the office water cooler talk (assuming it's not football season) and you can get a good gauge about just how incredibly off the general bandwagoner is.

After some comments here and there, but mostly the thread discussing replacement level got me thinking there should be an official VEB glossary of preferred stats to use when discussing players, trades, etc.  Hopefully here we can discuss and come to conclusions on which stats to use so you don't look silly and we can ride our SABRmetric high horse all over John Q. Edmondsjersey.

TEAM
First when discussing overall/general team quality most Statheads like to use a teams Pythagorean winning percentage.  Basically this is just a simple formula that involves the Runs a team scores vs the runs a team allowed.  "There is no explanation for the correlation between the formula and actual winning percentage in theory, rather the correlation has just been shown to work empirically".  I personally like using this stat to see what a team needs to improve on and how they evolved or regressed throughout the years.

OFFENSE
Most of us know or at least should know that man can not survive on Batting Average alone.  Who would you rather have, a .287 hitter or a .278 hitter?  Well if you chose the .287 you just picked Mark Loretta over Lance Berkman (okay that was predictable to almost everybody).  If you must use average when discussing a player it is more accepted to use the trio of slash stats, AVE/OBP/SLG.  If you don't understand why then you need to read MoneyBall or BP's Baseball Between the Numbers (where most sabr stats are explained in great detail). Often people like to use OPS instead of slash stats or if you want to get really fancy use OPS+, which is OPS with park-factor and you can find already calculated at the famedbaseball-reference.com.

VORP
One of everybody's favorite and seems to be the one that dominates is VORP or Value Above Replacement Player. As I already said, there is a 9 page explanation in BP's Book. Here's the long and short:

VORP is cumulative stat; additional calculations are required before using them to compare multiple player's potentials for future contribution. At the very least they must be normalized for PA. VORPr is rate stat, and therefore is slightly better for comparing players who have had a different amount of playing time as long as you account for sample size.[1]

Replacement level = freely available talent. Basically If your player goes down, a team can replace them from the waiver wire or the farm.

How Replacement level is calculated = It's actually based on RC/27 (runs created will be expanded on later) and it's based yearly. Basically BP tracks each teams regular players, then their backups then compare. They've found that generally backups perform at around 80% (again this will change year to year and by position)

Since they use RC/27 to determine replacement level, there is a formula published (that they don't explain how they came up with it) to turn slash stats into replacement level.

VORP is OFFENSE only and does not consider defense and it DOES consider position played so ARODs vorp isn't technically comparable to Pujols vorp since VORP compares Arod to 3b-men and Pujols to 1b.  

VORP is defined that way for a reason. A player putting up Hanley Ramirez numbers at SS is more valuable to his team than a player putting up those numbers at 1B, so if you are comparing "value" (IE trying to determine the league MVP) it is fine and common to use VORP.[2]

RC
One of my personal favorites is RC.  The reason why its the easiest way to relate a player to a teams Pythagorean W/L.  You can click on the link to see the different versions, but I just use Baseball-References.com already calculated ones, which he uses the "technical version".  

Clutch Hitting
Clutch hitting is still in dispute by BP because they can't empirically prove it yet.  If you are a hardcore fan of clutch hitting then a lot of people like to use either WPA that is tracked by fangraphs.com OR Win Shares (they are calculated way differently and often thought as the same stat as WPA).  

This blog isn't meant to tell you what to use, so I'm not going to expand any further on WPA/Win Shares and clutch in general, I'm just saying if you like "clutch" these are the most accepted stats for clutch.

Please don't use RBIs as a validation for a player.  I like using RBI when only discussing a hit.  BP tracks players performance in RBI situations, but even fans when RBI was created in the 20s knew that RBIs would be team dependent.  If you are in a fantasy league that uses RBIs I like sorting players by RBI opportunities then try to balance who is the best with the most.  

The scope of this blog was to inform readers of the most commonly used stats around here, maybe eliminate some, maybe add some.  Also help myself understand or hammer out any details.  If I have any info wrong or missing please feel free to add/discuss. Also, if you have a better way to data mine specific stats PLEASE add.  A lot of times I have to copy/paste stats into a excel spreadsheet when there is probably a better way.  If this blog is on the right track, next will come Pitching then Defense.  

Summary
Team: Pythagorean W/L  
Offense:  Vorp, RC, Slash stats, OPS, OPS+
Clutch:  WPA, Win Shares

NOT to use:  Average alone, RBI

Special thanks to
SleepyCA for [1][2]

Comment 33 comments  |  0 recs  | 

Do you like this story?

Comments

Display:

i think this is a great diary
your Value Over Replacement Poster at VEB (VORPveb) is incremented accordingly.

I would add:

  • A paragraph about "sample size" and "regression to the mean".  Geovanny Soto isn't going to have a 1.100 OPS next year.
  • VORP and WARPx are cumulative stats; additional calculations are required before using them to compare multiple player's potentials for future contribution.  At the very least they must be normalized for PA.  VORPr is rate stat, and therefore is slightly better for comparing players who have had a different amount of playing time as long as you account for sample size.  I wish BP had a "WARP4", which would be WARP3 adjusted to 590 PA...
  • while what you said about comparing players at different positions by VORP is true, VORP is defined that way for a reason.  A player putting up Hanley Ramirez numbers at SS is more valuable to his team than a player putting up those numbers at 1B, so if you are comparing "value" (IE trying to determine the league MVP) it is fine to use VORP.
  • The conditions surrounding statistical performance must be kept in mind when using statistics to compare players.  For instance, tools like PECOTA and Zips are not very useful when trying to predict the future performance of a player like Ludwick or Ankiel, because they don't fit the model of a typical player's development that the tools are based on.  Also, certain stats should not be looked at as "absolute"- IE Chris Duncan is not a .830 OPS player; his true talent level is probably low-.900's.  His actual performance this season was hindered by a freak injury that isn't likely to recur.  This is one of the reasons you still have to actually watch the games in order to make good decisions about players.
  • Keep other external factors like park effects and platoon splits in mind- Aaron Rowand won't be nearly as good if you take him out of Philly, Flores wouldn't be a good closer, etc.

by SleepyCA on Oct 17, 2007 3:17 PM EDT reply actions  

WARP4
aka "Ludicrous Speed"
Everywhere is within walking distance if you have the time.

by Solanus on Oct 17, 2007 5:07 PM EDT up reply actions  

excellent feedback
I personally don't want to write the regression to mean and sample size parts, I understand the concepts but not my forte.  

Whoever can write it do so and I'll just add it with credit.  I don't really want this to be an "all me" blog, so I have no issues with taking anything out or adding anything if it's correct to do so or more coherent etc.  

I def. need to add in the disclaimers about rate stats.  

Do most people use WARPx?  I found that most people don't like the way BP rates fielding (frar and fraa)   so a strong offensive metric plus a lame fielding metric kinda kills the stat, no?

"How depressing is it being you? Would you equate it to being a lifelong Cubs fan?"

by rocKStark5 on Oct 17, 2007 6:24 PM EDT up reply actions  

Great idea
VEB always does an excellent job using these types of statistics to argue or make a point.  Some readers might not fully understand them, and this diary would be very useful to those fans.  Maybe also add why certain stats are not very useful in determining a players value or projecting his abilities.

by qwikimport on Oct 17, 2007 4:55 PM EDT reply actions  

Like that idea
Most over-rated stats vs. Most under-rated stats

by ICbirdfan on Oct 17, 2007 5:14 PM EDT up reply actions  

Question
In the link for pythagorean win %, the linked article states that

"
Pythagorean winning percentage is an estimate of a team's winning percentage given their runs scored and runs allowed. Developed by Bill James, it can tell you when teams were a bit lucky or unlucky. It is calculated by

               (Runs Scored)^1.83
---------------------------------------------------------
 (Runs Scored)^1.83 +  (Runs Allowed)^1.83

The traditional formula uses an exponent of two, but this has proven to be a little more accurate.

"

My question is this: what does it mean when the authors states "a little more accurate".  Accurate implies there is a right answer - what is the pythagorean win% being compared to to be deemed more or less accurate.

by cdb on Oct 17, 2007 5:36 PM EDT reply actions  

clarify
To clarify what I am trying to say (not so clear when I re-read it):  If the pythagorean's measure of accuracy is actual wins, why not just use acual win %?  

by cdb on Oct 17, 2007 5:40 PM EDT up reply actions  

Accurate
Pythagorean records are meant to eliminate small-sample variation due to luck and uncontrollable factors rather than actual skill level. The 1.83 exponents are found to best approximate actual win loss-records over the history of the game.

Actual win percentage is dispreferred to pythagorean for some purposes since 162 games isn't a large enough set of contests to measure how good a team really is. (Although a hell of a lot better than 16.)

by liam on Oct 17, 2007 5:45 PM EDT up reply actions  

I believe
there should be some statistic which measures how "consistent" a player is, as well as one which compares how much worse all hitters are than Gary Sheffeld.

-Joe Morgan

Let me get this straight...Rowand over Pujols??? Really, Tony?

by cardzfan24 on Oct 17, 2007 6:26 PM EDT reply actions  

that's a good point
I still think that if Sheffield can get healthy in the next day or so, the Tigers can come back and beat the Indians.

by mattybobo on Oct 17, 2007 7:32 PM EDT up reply actions  

Consistency is measurable
I knwo that Joe Morgan talks a lot of gibberish, and often will use that word to mean 'good,' but you can look at a player's consistency statistically, even.  A first, simple way of doing it would be to take the standard deviation of their monthly OPSes, for example.

by Valatan on Oct 17, 2007 8:58 PM EDT up reply actions  

My head just exploded
-Joe Morgan
Let me get this straight...Rowand over Pujols??? Really, Tony?

by cardzfan24 on Oct 19, 2007 12:21 PM EDT up reply actions  

This is a great post
Whenever intense statistical discussions take place, it's easy for me to zone out and skip ahead in the post. I try to keep all the acronyms straight, but always end up forgetting everything except AVG/OBP/SLG.

I would love to see an expanded version of this as a permanent link for reference in the future. Perhaps adding a list of Acronyms would help, too.

Thanks rockStark!

by effin fisk on Oct 17, 2007 6:54 PM EDT reply actions  

I also liked this post a lot.
Cleared up a couple things about VORP for me.

Maybe an explanation of EqA, anybody? Don't know much about it.

by mattybobo on Oct 17, 2007 7:35 PM EDT reply actions  

I was gonna put in EqA
but nobody seems to use it, why I don't know.

Like Vorp its purely offensive but it's a Rate statistic.  

It combines hitting for average, power, drawing walks, HBPs, Stolen base ability and it's Park Adjusted and Adjusted for league/yr difficulty.  

It's all mathed up to Equivalent Runs per Out that reads like a batting average I.E. a .260 EqA is an average hitter, .300 is good, .340 is great.

What I always do is check wiki and google.  Most stathead sites have glossary pages, but they really have just definitions.  I wanted this page to be a "most used" or "most accepted" type thing.  I hate coming across a stat and not knowing if it's already been dated, disproved or otherwise useless.

"How depressing is it being you? Would you equate it to being a lifelong Cubs fan?"

by rocKStark5 on Oct 17, 2007 8:09 PM EDT up reply actions  

You can't disprove a stat
you can say that it doesn't really measure what it's meant to measure, but that's something different.  The argument that 'average is useless' isn't really valid--it's more an argument that either--a) hitting for a high average isn't as reproducible a skill as other things are or b) hitting for a high average isn't that valuable of a skill, and hence doesn't measure offensive prowress as well as other things might.

But none of that disproves batting average, in the end.  It is what it is, and it is certainly a measure of something, and certainly a hitter with a .400 average is more valuable to his team than a batter with a .200 average.  

Now, the main thing is, sabrmetric type stats are great for a lot of things, and it makes comparing things a lot easier, but they also have a lot of drawbacks.  A lot of the advanced metrics will depend on hidden statistical modeling* that the creator doesn't directly explain.  This can be a problem, particularly for things like park factors, which have a LOT of statistical noise in them, IMO.  So, when you say something like "VORP is a better measure of offense than OPS+, which in turn is better than unadjusted OPS," there is an implied value judgment there, that depends on what you mean by 'better' and 'offense.'  

Some statistics make some things clearer and make other things more obscure.  Having a big toolbox is important, and while saying 'VORP is the end all of measuring offense' might be a slightly more accurate statement than 'a player with a high BA is always best,' it is still better to look at a player through as wide an array of lenses as possible.  

Sorry for the long rant.

*statistical modeling is the practice of assuming an adjustable relationship between two things, and then using a large data set to adjust the relationship to something that you think mirrors a true relationship--an example is taking a bunch of points that you expect to be a straight line, and drawing what you think to be the best straight line through them.

by Valatan on Oct 17, 2007 9:14 PM EDT up reply actions  

.200 vs .400
I feel that a .400 hitter is not necessarily more valuable then a .200 hitter.  Take the extremes (for purposes of comparisons Russell Branyan and Aaron Miles, I know Miles doesn't hit 400 but I wanted a slap single hitter) if a .200 hitter hits lots of HRs and hardly ever hits into double plays and the .400 hitter gets only singles with a very rare XBH then which is really more valuable?  That to me is why batting average can be misleading.  I do agree that for the most part what you said will hold true but I would take a .240/.300/.600 hitter over a .320/.330/.400 hitter most days of the week.

by StLHugo on Oct 18, 2007 8:44 AM EDT up reply actions  

This was especially helpful Val
I would like to echo some of the above posters who said that it would be nice to have a permanent expanded post on these definitions.  But I would want to have Val's points mentioned or even expanded upon under some rubric because, for someone like me, who is just starting to learn about all of these things, what Val is explaining helps to contextualize the need for "a big toolbox" and gives some guidance on how to evaluate different stats.  Perhaps there could be a way for knowledgeable people to present varying perspectives for newcomers or "non-statheads" to consider.

by nycardfan on Oct 18, 2007 10:39 AM EDT up reply actions  

Ditto
I think that adding this page or one like it as a perma link under a "glossary" type heading would be very helpful.  Allow users to add input and the original author or a moderator can clean it up and add it in.  We have so many stat heads here that toss out things like FRAA, WARP, OPS+, FIP, etc. that some people just don't understand.  Everytime I see a stat I don't know I look at the glossary on baseball reference and sometimes baseball prospectus as well just to get a handle on how that stat is measured.

by StLHugo on Oct 18, 2007 10:59 AM EDT up reply actions  

I like
reference page at Sons of Sam Horn.

This isn't directed at you, NY, or anyone in particular, but I will say that I haven't met a saber geek yet that doesn't take the time to explain something when someone asks. The retarded straw men and knee-jerk defense mechanism of mocking something you don't understand gets old, but open-mindedness and a willingness to just ask is awesome. At least in my opinion.

by plh903 on Oct 19, 2007 3:48 AM EDT up reply actions  

What I have not heard much about
are nuanced discussions tht relate to Val's point: "Now, the main thing is, sabrmetric type stats are great for a lot of things, and it makes comparing things a lot easier, but they also have a lot of drawbacks.  A lot of the advanced metrics will depend on hidden statistical modeling that the creator doesn't directly explain.  This can be a problem......"  I find such hidden methodological influences interesting.  

It's not just a matter of asking or explaining.  It's learning more about varying perspectives and approaches of knowledgeable people, and how they perceive statistical modeling to be influencing conclusions.  I don't read VEB everyday, so maybe I've missed these kinds of discussions.  Nevertheless, I thought what Val said about that in relationship to needing "a big tool box" to be particularly helpful.

by nycardfan on Oct 19, 2007 8:35 AM EDT up reply actions  

I don't think they
do have drawbacks, at least as far as the stats are concerned. As far as the black box issue, my take is a couple posts down.

What are the drawbacks? Most of them do what they say they do. If not they get their ass handed to them by Tangotiger or someone.

A lot of people are more interesting in proving that "the game isn't played on paper" (like anyone fucking claims that) and why stats shouldn't be trusted then figuring out what they are useful for. One of these things is easier than the other, and this extends to lots more than just baseball stats.

by plh903 on Oct 19, 2007 6:21 PM EDT up reply actions  

I should have added that I agree that most
people who use this site are happy to explain definitions if asked.  My above comment wasn't meant to deny that.  I was just addressing something else when talking about Val's post.

by nycardfan on Oct 19, 2007 10:15 AM EDT up reply actions  

And one of the big problems is that
A lot of the sites WON'T tell you what their methodological models are.  

This is the 'black box' problem that you hear people referring to--BR takes a players raw stats (AB, H, HR, 2B, etc.--they usually are pretty good about telling you what the inputs are, at least), put them into one end of a 'black box' and get VORP and FRAA and whatnot out.  We're not allowed to look inside the black box.  This is why it is important to see if everyone's system and the traditional methods agree on a guy.

None of this is to say that what they're doing isn't valid--more than anything, I'm just showing my bias as a physical scientist.  To me, VORP is kinda the equivalent of a service telling me what I can use a Bessel function for, and allowing me to look up values on a case by case basis, but not telling me any of the theory behind it, or how to calculate it's value for an arbitrary set of parameters.

(which, by the way, is actually how scientists did business back in the day--go to a secondhand book store, and you can find books that do nothing but list all of the values of sin x).

by Valatan on Oct 19, 2007 11:58 AM EDT up reply actions  

btw,
this isn't to say that these things aren't valuable or useful, or that they shouldn't be trusted.  Just that the attitude you see, for example, over at firejoemorgan about advanced statistics being incontrovertible truth make me squeamish, as they do have caveats.

by Valatan on Oct 19, 2007 12:03 PM EDT up reply actions  

FJM is basically
schtick. Funny one-liners aren't made of "this is just a best-guess, there are certain things that aren't accounted for in X stat, and projections can not achieve perfection due to expected variance" or whatever.

I do agree with most of what you've said here though. The problem is generally in the utilization and not the stat itself.

by plh903 on Oct 19, 2007 6:10 PM EDT up reply actions  

Aside from the proprietary
stuff. For one, I'm comfortable using EqA if it maps to runs regardless of whether BP is just a big circle jerk to their brand or not. However, aside from them I can't think of much that the methodology isn't willingly shared. And as pointed out, we do pretty much know how all their stats are formulated.

If PECOTA shows me a good coefficient and Nate talks about his methods in detail here and there, and smart people agree that it's a good system, then I'll treat it like a good projection system. If FRAA is demonstrably deficient then I'll ignore it. Those are the only two things I think we don't know the formulas for.

by plh903 on Oct 19, 2007 6:14 PM EDT up reply actions  

What you are saying makes perfect sense
and helps me understand why I sometimes feel a skeptical unease with respect to certain arguments. I want to see in the "black box" to get a closer look at the parameters that may be influencing different (and sometimes competing) claims.  I'm used to doing this in my own areas of study or in teaching, but this obviously is not an area I'm that familiar with.  

As I said above, I found your "big toolbox" image to be particularly helpful.  It gives me a better idea of what to be aware of as I gradually learn about all these new things.  Well, anyway, thanks for helping along a newbie.

by nycardfan on Oct 19, 2007 1:11 PM EDT up reply actions  

the vorp forumla
and theories behind it have been published to the public.  Only their pecota projections have not/will not be.  Like I already said, an entire chapter has been dedicated to it in their book.
"How depressing is it being you? Would you equate it to being a lifelong Cubs fan?"

by rocKStark5 on Oct 19, 2007 1:26 PM EDT up reply actions  

Val
The context you provided here is very helpful.  Especially when posters start firing shots like "hollow BA" (what?!?), it helps to know what point they are trying to argue.  

by cardsgirl95 on Oct 18, 2007 11:36 AM EDT up reply actions  

Comments For This Post Are Closed


User Tools

The Internet's #1 St. Louis Cardinals blog.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

N1046613005_8392_small
Our 2010-2011 strays
649494__1__small
Hall of WAR: Part 2

Recent FanPosts

Hahaha_small
These were a few of my favorite things (fink reminisces about the 2011 regular season)
Dsc01844_small
Cardinals take the Governor's Joplin Challenge, will help build 35 homes for torando victims
St-louis-cardinals-script_small
Best Cardinals of All-Time - Relief Pitching Edition
St-louis-cardinals-script_small
Best Cardinals of All-Time - Starting Pitching Edition
Small
Two Trades That Set the Cards Back in the 70s
Nyc_small
Cardinals Offense vs. Reds Offense - 2012
Nyc_small
Cardinals Rotation vs. Reds Rotation - 2012
St-louis-cardinals-script_small
Best Cardinals by Position - Center Fielders

+ New FanPost All FanPosts >


Managers

Jack_benny__1__small DanUpBaby

Editors

Bendermad_small azruavatar

Trigun_001_small the red baron

Images_small tom s.

Authors

1989_bgh_cropped_small bgh

Valverde_medium_small vivaelpujols