So, inspired by a discussion that we had on the site regarding Skip Schumaker in yesterday's thread, I decided to do a little bit of investigative work on a particular theory that I've been thinking about for a few days. One useful tool for evaluating pitchers in a slump has been Batting average on balls in play, which is given by the formula (H-HR)/(AB-HR-K). What this tells you is how hitters do when not factoring home runs and strikeouts into the equation. It has been shown that pitchers have very little control over what the BABIP they allow is. A pitcher that allows a low BABIP is essentially being very unlucky. A pitcher that allows a high BABIP is getting very unlucky. The expectation, then, is that these pitchers are the ones most ripe to revert to the mean.
Though it is somewhat counterintuitive, (it certainly was to Voros McCracken, who discovered this fact), it actually makes some sense--once the opposing player puts a bat on the ball, the pitcher's involvement in the play is over. Therefore, one would not expect the pitcher to be able to influence the outcome after the ball is put on the bat from year to year. The question, however, is whether or not this trend would apply to hitters, as well. In the game thread, in particular, it was applied to Skip Schumaker, with the argument that his 2007 BABIP was unsustainable, and that he was due to fall back to Earth.
The logical reason for a pitcher's BABIP being random, however, doesn't seem to jibe with a hitter's BABIP being random, however. A hitter's involvement in the play is not over after the bat is put on the ball--the guy has to get to first base. Therefore, you would expect the hitter to have some control over whether or not he actually reaches base. In particular, you would expect a fast player to be more able to reach first base before the opposing defense successfully makes a play. My counter-hypothesis, therefore, was that a high BABIP correlates with a fast player. Using stolen bases as a lazy proxy for speed, I took the seven leaders in post-WWII stolen bases, and looked at their career numbers to eliminate any small sample size issues. The result was the following table:
So far, so good. Every one of these guys had a BABIP that beat their batting average. Additionaly, the better the rate at which they stole bases, the more they beat their BABIP by, on average. Vince Coleman, with the highest SB rate, had a BABIP that beat his batting average by 50 points. High school statistics tell me that I would expect 40% of the variation in the differential between BABIP and AVG to be explainable by the rate at which a player steals bases. Looks like we might have something.
Not so fast, perhaps. I expanded the list to include a wide array of hall of famers, and a few other notable players. The table generated there is shown below.
All of a sudden, my theory looks a little less convincing. Jim Thome, the slowest player on the list (in terms of career SB, at least), also had a BABIP that outperformed his AVG by the greatest amount. Willie Mays, hardly a slowpoke, had a career BABIP that actually underperformed his career average. The amount of variation in the BABIP differential went from 40% to 25%.
Some of this is clearly explainable by the fact that stolen bases are probably not all that great of a proxy for actual speed, as being a successful base stealer is as dependent on being able to read the defense and leading off well as it is upon actually being able to run quickly. There is also the problem that some of these players (Griffey, I'm looking at you) started out much faster than they ended up, and career numbers probably mix together seasons where the player's style of play changed pretty significantly.
So, where does that leave things? The very fastest players can sustain a higher BABIP than their AVG would indicate. For slower players, however, it seems much more doubtful that they'd be able to do so. If I have some time later, perhaps it will be worth looking at some single season statistics and seeing if a clearer pattern can emerge. Or perhaps there is better data on player speed out there. Regardless, it's something to think about while waiting for the playoffs to begin.