One of the beautiful aspects of baseball is the sheer unpredictability of it, compared to other sports. One of the biggest pieces of variance is Batting Average on Balls in Play, or BABIP. BABIP has a high variance to it simply because some hard hit balls become outs, and some softly hit balls become hits. While balls down the line and in the gap are more likely to fall in for hits, analysts have not found hitters to have the skill to follow Wille Keeler's advice to "Hit it where they ain't". Mostly, the hitter is just trying to make good contact. Thus, a player can have a really high BABIP one year, have the same approach the next year and have a low BABIP. The league average BABIP each year is generally right around .300, much higher than Batting Average itself as it doesn't include strike outs.
So every year, a hitter's performance it is rarely separated from their BABIP. A high BABIP can buoy a profile that is weak in other departments, but if the hitter doesn't have the skill to maintain a high BABIP then his performance will be a mirage, and when his BABIP lowers he will look much worse. A hitter can be doing really well at limiting K's, drawing walks, and hitting for power but have his stats held down by a low BABIP, making him look not quite as good as he should have.
Projecting BABIP is a difficult process, but analysts have developed formulas that are more predictive of future BABIP than BABIP itself. Using the formulas mentioned here, I decided to find out each Cardinal hitter's xBABIP for two different formulas, and compare them to what their actual BABIP has been so far.
Andrew Perpetua's xBABIP caclulation
This xBABIP calculator is based entirely on Statcast data on velocity and launch angle of each ball in play. The idea is that if you know the velocity and launch angle, you know roughly how long the ball will be in the air and where it will land, and from there you can calculate how long it will take for a defender to get there and historically, how often those balls in play become hits. Andrew's calculation is a black box, meaning you only get the output of the system rather than the equation itself. It may need continual tweaking because of it being based on new information, but I imagine this type of system will get us much closer to understanding BABIP when the data feeding it is much larger.
Alex Chamberlain's xBABIP caclulation
Alex's calculation is public, and goes as such:
xBABIP = .1975 - (.4383 x True IFFB%) - (0.0914 x True FB%) + (.2594 x LD%) + (.1822 x Hard%) + (.1198 x Oppo%) + (.0042 x Spd)
Regularly, infield fly ball% is calculated as number of pop-ups divided by fly balls. However, a more useful number is the number of pop-ups divided by total balls in play. That is what "True IFFB%" represents. "True FB%" is FB% with the portion of them that are infield fly balls taken out.
IFFB% is important to understanding BABIP because infield fly balls are almost always outs. Outfield fly balls are less likely to be hits than grounders, so for purposes of BABIP, fly balls are bad. Line Drives are the most likely batted ball to be a hit, and while LD% also bounces around a lot, it doesn't do so as much as BABIP. The harder a player hits his line drives and ground-balls the more likely he is to get a hit on them, thus Hard% also helps inform the projection. Opposite field% (Oppo%), or the percentage of batted balls that the player hits to the opposite field, plays a part because the more often a player hits to the opposite field, the less the other team is able to shift against him. Finally, Spd, or speed score is used as a proxy for player speed, as faster players are better able to leg out infield hits.
I also wanted to include Mike Podhorzer's xBABIP calculation, however, when trying to recreate his equation using both programming and spreadsheets, I was unable to reproduce the same results as his online spreadsheet for the equation, and I was unsure where the error was coming from so I decided to forgo that calculation. However, as mentioned in the link above, Mike now uses Alex's equation anyway as he finds it easier.
With two different xBABIP calculations, one based on batted ball averages, and one based on the new fangled, more granular Statcast data, this gives us two ways of observing a player's BABIP skill. With that out of the way, here's how each of the 2016 Cardinals position players have fared in both BABIP calculators:
"BB xBABIP" indicates Alex's calculation, based on batted ball stats. xStats is the name of Andrew's website for publishing his data, so i used that to label his calculation. I then averaged the two, and took the difference between that average and their 2016 BABIP to find how much each player over or under performed their xBABIP. I also want to note that this is just using 2016 numbers and a more rigorous analysis would involve a weighted sample of their MLB and MiLB careers, but that would involve much more time than what I had for this article. I might do that in the future, but for today we'll use a more simple analysis. I left Eric Fryer, Greg Garcia, Ruben Tejada, and Jhonny Peralta off because of a lack of plate appearances.
Randal Grichuk is the single biggest under-performer, but both calculations are bearish on Grichuk's chances of being above-average in BABIP going forward. Being that he currently has a 83 wRC+, if you replace his current BABIP with either xBABIP, even if you made the (certainly wrong) assumption that the extra hits were all singles, he'd be an above-average hitter for the year.
The biggest over-performer in this analysis is Matt Adams, owner of an unsustainable-for-anyone .365 BABIP. Alex's equation likes the batted ball profile, but Andrew's calculation is less impressed with the quality of Adam's contact expressed in more granular terms of exit velocity and exit angle. Even taking the more bullish calculation leads one to think Adams' BABIP is bound to regress though.
Stephen Piscotty is the second biggest over-performer, but the stats involved in both calculations imply that he has a real ability to produce an above-average BABIP. Both calculations see Holliday as more of a middle of the road BABIP performer despite a long track record of above-average years in the category. You could also see it as an optimistic forecast though, as his xBABIP is higher than his current BABIP by 34 points.
Aledmys Diaz has by far the largest difference between the two systems. Alex's equation sees him as below average at balls in play, due to a high IFFB% and low Oppo%, whereas Andrew's equation sees him as elite at it. The average of the two ends up placing him pretty close to his 2016 BABIP. It's also nice to see that both calculations see Molina as better than he has been so far on balls in play.
Perhaps one can say from this analysis that Matheny shouldn't be moving Piscotty to center in order to get Matt Adams in the lineup. Adams isn't walking as much as Grichuk, but he still has a a 12 point advantage in xBABIP, and a 56 point advantage in ISO. Grichuk's 83 wRC+ also depends on improved K and BB numbers this year that might not hold up through the rest of the season. Adams probably would still be the better hitter when adjusted for xBABIP, but Grichuk would be the better base-runner. He'd also be a better defender than Piscotty in center, with Piscotty being a better defender than Moss in Right-field (though, to a smaller degree than those two, Adams is an upgrade at first over Moss defensively).
I don't think one can conclude that Grichuk should be playing over Adams from this analysis, but it does offer more evidence than their current stats do. Whatever we can project, all that matters is that Adams has produced, to the tune of a 146 wRC+ on the season. While that continues, Matheny will get his bat into the line-up. That's not really a bad thing, as long as he adjusts his thinking if and when Adams and Grichuk regress in their respective directions. Grichuk will get playing time, as Matheny has not been shy about resting his starters this year (with one obvious exception), so he will get his opportunities to get back in the regular lineup.
While this sort of analysis stops short of justifying a roster move or line-up change, it does give us a better idea of what to expect going forward. While Grichuk will likely never post a BABIP as good as last year's .365, there's a lot of reason to believe he's much better than this year's .241. Molina has trended the wrong way after a hot start to the year, but he's more likely to improve his BABIP than not going forward, keeping his hitting at a more than acceptable level when paired with his superb defense. While Wong has been unlucky a bit on balls on play, the difference isn't enough to make up completely for his woeful 66 wRC+; a true bounce-back would involve boosting his ISO above Jon Jay/Yadi levels (.065).
With that said, with BABIP anything can happen in a season's time. We're only talking about what will happen on average, based on two systems. Results on balls in play are a big part of baseballs' randomness, so even the best projections are going to have pretty wide discrepancy with the results on some percentage of players. It improves our understanding better than BABIP on it's own though, which makes these equations a useful tool.