FanPost

Runs Created by Decade: An Analysis in Variance

In the discussion over Albert Pujols's status as a Hall of Famer, a statistical debate arose over how we should compare players from by-gone decades in terms of OPS+.  Valentin pointed out that a comparison to the average player (what OPS+ does) might be biased if there's a significant difference in the variance in OPS in past eras.  IOW, Valentin suggested that it would matter if, say, a player who posted a 130+ in the 1950's was an extreme abnormality while a player who posted a 130+ in the 1990's was relatively common.  

Well, this presents an empirical question: is there a substantial difference in the variance in offensive production in previous eras.  To help answer this question, I have examined the Runs Created for MLB from 1950 to the present.  Specifically, I've looked at the average Runs Created for each decade from 1950 to 2005 (latest year available), to see if there has been a substantial difference in the variation in runs created over those decades.  

I employed the dataset on batting provided in Sean Lahman's Baseball Database.  I excluded all players with less that 150 at bats in a season.  The following data include the mean, median, & modes, the variance and standard deviations for Runs Created.  I've used the following runs created formula:

rc_X1 = (H + BB - CS + HBP - GIDP);
rc_X2 = TB + (.26 * (BB - IBB + HBP));
rc_X3 =(.52 * (SH + SF + SB));
rc_X4 = AB + BB + HBP + SH +SF;

rc = rc_X1 * (rc_X2 + rc_X3) / rc_X4;

I've also included a quantile breakdown for runs created in each of the decades:

The Fifties (1950-1959)

N     1813
Mean  55.9209845    
Std D 31.5128001    
Var   993.056569
Med   49.45367

                                      Quantile        Estimate
                                                      100% Max       187.62018
                                                      99%            148.14395
                                                      95%            114.88000
                                                      90%            100.00556
                                                      75% Q3          76.41856
                                                      50% Median      49.45367
                                                      25% Q1          30.34681
                                                      10%             20.05333
                                                      5%              16.39892
                                                      1%              11.37026
                                                      0% Min           8.57665

The Sixties (1960 - 1969)
N        2289
Mean     52.29305    
Std Dev  29.53063
Median   47.23479    
Variance 872.05787
Range    169.64884
                                                      Quantile        Estimate
                                                      100% Max       177.66250
                                                      99%            132.01471
                                                      95%            107.21445
                                                      90%             94.02014
                                                      75% Q3          70.89245
                                                      50% Median      47.23479
                                                      25% Q1          27.71607
                                                      10%             18.39132
                                                      5%              14.82995
                                                      1%              10.43723
                                                      0% Min           8.01366

The Seventies (1970 - 1979)

N       2918    
Mean    52.75782    
Std Dev 28.66373
Median  47.93912    
Var     821.60919
Range   154.24352
                                                      Quantile        Estimate
                                                      100% Max       160.26340
                                                      99%            129.96813
                                                      95%            103.82451
                                                      90%             92.60296
                                                      75% Q3          72.36598
                                                      50% Median      47.93912
                                                      25% Q1          28.57147
                                                      10%             19.13247
                                                      5%              15.67842
                                                      1%              10.78744
                                                      0% Min           6.01988

The Eighties (1980 -1989)
N        3162
Mean     51.70840    
Std Dev  28.05438
Median   46.96388    
Var      787.04827
Range    147.99317
                                                                                                        Quantile        Estimate
                                                      100% Max       153.60612
                                                      99%            126.75796
                                                      95%            105.90848
                                                      90%             90.73343
                                                      75% Q3          70.23333
                                                      50% Median      46.96388
                                                      25% Q1          28.43210
                                                      10%             19.38759
                                                      5%              16.01681
                                                      1%              11.61000
                                                      0% Min           5.61295

The Nineties (1990 - 1999)
 N      3395
Mean    56.06562    
Std Dev 31.74681
Median  49.65593    
Var     1008
Range   186.91802
                                                      Quantile        Estimate

                                                      100% Max       193.33921
                                                      99%            146.74168
                                                      95%            115.88431
                                                      90%            101.28195
                                                      75% Q3          76.41611
                                                      50% Median      49.65593
                                                      25% Q1          30.25041
                                                      10%             20.69598
                                                      5%              16.86006
                                                      1%              12.17636
                                                      0% Min           6.42118

The Oughts (2000 - 2005)
N       2639
Mean    59.70025    
Std Dev 34.02325
Median  53.48486    
Var     1158
Range   223.95212  

                                                      Quantile        Estimate

                                                      100% Max       230.40970
                                                      99%            154.90486
                                                      95%            124.12382
                                                      90%            108.89175
                                                      75% Q3          80.00087
                                                      50% Median      53.48486
                                                      25% Q1          31.56901
                                                      10%             21.79130
                                                      5%              18.15000
                                                      1%              12.58929
                                                      0% Min           6.45758

============
OK.  So what do all those numbers mean?  Well, for one, it is apparent that there has been some shifts in the variance in runs created over time.  The maximum RC's have increased in some eras relative to other eras (Max RC in the Oughts:  230.40, Max RC in the 1980's: 153.60). This is consistent with the notion that the runs created distribution has expanded over the last few decades.

However, there doesn't seem to be a substantial difference in the standard deviations in each decade. IOW, while we've seen an expansion of the distribution at the tails, for the most part, the standard deviations in the bulk of the distribution remain relatively the same.  We've seen a slight increase in standard deviations, relative to other decades:

1950's 31.51 runs
1960's 29.53 runs
1970's 28.66 runs
1980's 28.05 runs
1990's 31.75 runs
2000's 34.02 runs

So the max variance in comparing any one decade to another is 5 runs created in a season.  So even if we weighted our OPS+ calculations to account for this difference in SD, I doubt we'd get much of a difference in our results.

I invite comment on my baseline assumptions, the results, and the conclusions I've drawn.  D.GOOCH
                             

X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Viva El Birdos

You must be a member of Viva El Birdos to participate.

We have our own Community Guidelines at Viva El Birdos. You should read them.

Join Viva El Birdos

You must be a member of Viva El Birdos to participate.

We have our own Community Guidelines at Viva El Birdos. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9351_tracker