If you ask me for my favorite statistic in basbeball, it is batting average on balls in play (BABIP). It is not the stat that best represents a player's value. It's certainly not the most complicated statistic to emerge from the modernized understanding of baseball. BABIP is under appreciated for the heavily lifting it does and the more advanced principles it encompasses.
First of all, for the uninitiated, what is BABIP? As I mentioned above, it stands for Batting Average on Balls In Play. The general premise is that BABIP ignores any plate appearance that isn't a ball in play. It doesn't care about home runs or strikeouts or walks. It's simply focused on when a hitter makes contact and the ball is in the fair territory of the baseball field. A high BABIP is a good thing because that means more of the balls that were put in play are resulting in hits. A low BABIP is bad thing for exactly the opposite reason; it means a player is making a lot of outs.
BABIP is on a scale similar to batting average but with a slightly higher threshold for success. A BABIP under .300 is usually considered poor. A BABIP over .330 is considered "good". I use vague qualifiers about the good and bad level of BABIP because, and we're about to dive into why it is a wonderful stat, BABIP is a statistic that often points to a player's luck more so than a player's skill.
An average baseball player is going to have a BABIP that resides between .300 and .330. Anything beyond that is most likely a sign of being bad luck or good luck. It takes multiple seasons for BABIP to "stabilize" as a statistic. After about two seasons worth of plate appearances, the signal in the statistic will outweigh noise to a high enough degree that you can make some more compelling arguments about what a player's true talent level is. Up until that point though, BABIP is a great barometer of luck and has to be heavily regressed back to the league average BABIP.
There are two components of "luck" in BABIP to keep in mind. Each type of hit (grounder, liner, flyball) has an empirical probability of turning into a hit. Line drives, for instance, turn into hits a very high percentage of the time. Less so for groundballs and flyballs. So if a player is hitting for a high BABIP, they may be getting fluke hits: groundballs that find holes, flyballs that are bloop singles. Alternately, if a player has a high BABIP and is also hitting a lot of line drives, they may have a completely expected BABIP given that batted ball profile. In other words, while BABIP normally resides between .300-.330 for most hitters, it is dependent on what types of balls they put into play.
But that's still a component of luck! In 2008, Ryan Ludwick had a .342 batting average on balls in play. Seems lucky, right? But he was also hitting 26.3% of his balls in play as line drives so we'd expect his batting average on balls in play to be high. Herein lies the trick. His BABIP was lucky not because of bloop hits but because he had an unsustainable batted ball profile. Very few players can hit 26% of the balls they put in play as line drives on a season after season basis. Indeed, Ludwick had shown no unique ability in that regard to date and, in 2009, he'd come back down to earth with a line drive rate just under 20%.
BABIP as a stat is fascinating to me because it relies so much on player specific interpretations (what does their batted ball profile look like?) while simultaneously being very consistent in it's range of "non-lucky" or sustainable BABIP rates.
To get player specific for a moment, BABIP is, essentially, the reason that Jon Jay is an good offensive player. Jay, over 1300 PAs now, has maintained a .348 BABIP. That elevated success rate when he puts balls in play is why he hits for a high average and can hold his own at the plate despite middling walk rates and power. Jon Jay's true talent BABIP is, from the evidence available to us, higher than the norm. Ditto for David Freese. It's possible that those two players make harder contact or hit the ball with a better departure angle or some other identifiable skill but that is largely unknown to us via public data. (Hit F/X may have some clues in this regard but, alas, it is proprietary.) So the only thing we can definitely say is that Jon Jay and David Freese have surpassed the threshold for BABIP to stabilize and there is, somewhere in there, an aspect of their game that strongly suggests to be a skill in converting balls in to play to hits.
Now that I'm done building up two players, let's talk about the real sucker punch here: Pete Kozma. After yesterday's game, during which I saw Pete Kozma rack up some awfully lucky looking hits, his seasonal line stands at .306/.353/.565. More to the point, his BABIP is .378. Kozma is hitting an unusually high level of line drives (27%) but it's safe to say that the default assumption on his performance should be (1) he's getting lucky on BABIP and (2) this is not a sustainable line drive rate and, consequently, not a sustainable BABIP.
There is nothing in Pete Kozma's history to suggest that he's capable of hitting for a high BABIP. Quite the opposite in fact. It's possible that he's changed his swing mechanics and is thus having more success. (Lance Lynn did pick up 3 miles an hour on his fastball in AAA after all.) More likely, however, Pete Kozma has just been lucky as hell this year at the plate.
BABIP is, in principle, a relatively easy stat to understand. Calculating it isn't hard. It has a nice, easy to remember range of normal values. On a player specific basis, however, it is incredibly nuanced requiring concessions to batted ball profile, opposing defense and other variables. It's a stat that should be regressed heavily in small sample sizes but also has players that have shown a true skill in being outliers.
Today, I'm writing a love letter to BABIP, because it let's me mentally process a season in which Pete Kozma has a .900+ OPS without having to assume that the apocalypse is nigh. Thank you, BABIP. Thank you.