Baseball will never be the same.
This was the mantra uttered by Major League Baseball when it unveiled Statcast to the general public.
Integrating both a camera-based tracking system and doppler radar technology, Statcast has unlocked today's golden age of information. As MLB.com's Paul Casella put it, "The better question...may very well have been what can't Statcast measure?"
Emma Baccellieri wrote a fantastic piece at Deadspin entitled Major League Baseball's Statcast Can Break Sabermetrics, which dives into the past, present, and future of baseball information as raw Statcast data is interwoven with the sabermetrics boom of the early 21st century.
A prime example of this trend is a metric that many of you have seen referenced here at Viva El Birdos before: expected weighted on-base average (abbreviated as xwOBA).
For those unfamiliar with the concept of linear weights, wOBA assigns relative values to the outcome of every plate appearance (with the exception of those ending in an intentional walk or sacrifice bunt). We know that a home run does not bear exactly twice the value of a double, and wOBA does a good job of accurately valuing a batter's contributions. While the specific values–also known as "constants"–vary slightly from year-to-year based on the run environment, I provided the numbers for the 2017 wOBA formula for context.
2017 wOBA Constants
|Event||2017 wOBA Value|
|Event||2017 wOBA Value|
In its statistics glossary, FanGraphs provides the following example using Mike Trout:
For example, in 2013 Mike Trout had 100 unintentional walks, 9 HBP, 115 singles, 39 doubles, 9 triples, and 27 home runs. If you multiple each by it’s corresponding weight and then divide that number by the sum of his at bats, walks (excluding IBB), hit by pitches, and sacrifice flies, you get .423, or his wOBA for the season.
While wOBA may accurately explain what happened in the past, its predictive power going forward is limited. If a player suddenly breaks out for a .400 wOBA to start the season, can we expect him to sustain that level of production? If he achieved that .400 wOBA through bloopers that luckily dropped in for hits or slow ground balls that somehow trickled through the infield, then the answer is probably no. However, if that player was crushing line drives and deep fly balls, you could more reasonably expect him to perform well in the future.
That's where xwOBA comes in.
Given a player's exit velocity (how hard he hit the ball) and launch angle (how high he hit the ball) data, we can estimate what we would expect his wOBA to be based on batted balls with similar Statcast numbers. At FanGraphs, VEB's own Craig Edwards found that you are better off predicting a player's future performance using his expected wOBA as opposed to his observed, or actual, wOBA.
So that's it? We unearthed the Holy Grail of baseball stats?
Not quite. One of the chief issues with xwOBA is that it assumes the same batted ball should produce the same results for every player. In other words, if Billy Hamilton and Yadier Molina hit identical ground balls, they would both receive the same xwOBA. In reality, though, we know that Hamilton is more likely to leg that ball out for an infield hit. Consequentially, xwOBA is prone to "underrating" speedsters and "overrating" the guys who are less fleet of foot.
So that's where I began. I compiled data from every player in the Statcast era (since the beginning of 2015) who accumulated at least 400 at-bats in one season. What I was looking for was any correlation between speed and a difference between observed and expected wOBA. Fortunately for me, Statcast released a metric called sprint speed last summer that calculates how many feet a player can run in one second.
What I found wasn't all too surprising: the league average sprint speed is 27 feet-per-second, and every additional foot from 27 allows a player to outperform (or underperform) their expected wOBA by a little over eight points. Put another way: a player with a sprint speed of 28 feet-per-second and an xwOBA of .300 really should have posted an xwOBA of roughly .308 after taking their speed into account.
There was also one other variable I wanted to test for: park factors. The observed wOBA in games played at Coors Field–a renowned hitter's haven–in 2017 was a whopping .356 while the expected wOBA according to the quality of contact was fifty points lower at .306. Repeating the same process as detailed above, I compared players' xwOBA and wOBA to the park factors for the team they played on that season.
While the correlation wasn't quite as strong as with sprint speed (perhaps because most ballparks don't deviate too much from the league average run environment), the more extreme stadiums altered wOBA enough for me to include it in my final concoction. For every 1% that a team's park factor strays from the league average, its players' xwOBAs should be adjusted accordingly by slightly less than one wOBA point.
With the pieces of the puzzle now in place, it was time to move onto the most important step in this entire process: naming my creation.
Conventional wisdom says that I should name this metric for what it is: speed-and-park-adjusted expected weighted on-base average. But as the astute reader may be asking right now, what kind of moron invents a baseball stat and calls it sapaxwOBA?
No, I needed something that rolls off the tongue better. Something that epitomizes what one might call–ah, what's the word?–Phamtastic.
In all seriousness, I'm open to any suggestions you may have, but in the interim...meet PHAMwOBA. Why?
I kept putting this part off, assuring myself that I would come up with something by the time I reached this point in the article.
- Of all Cardinals players, Tommy Pham is the player who in 2017 benefitted the most from using PHAMwOBA instead of xwOBA.
- Pham is a noted FanGraphs reader who embraces advanced metrics.
The complete formula goes as follows:
While we're on the subject of Pham, let's use his 2017 season as an example.
After plugging in his 28.7 feet/second speed score and the Cardinals' park factor of 0.97, we get a speed adjustment of -0.014 and a park adjustment of 0.003 when rounding to nearest hundredth of a point. So in total, we are subtracting negative 11 points (or adding 11 points) to his .366 xwOBA. This gives Tommy Pham a PHAMwOBA of .377.
But does it work?
When comparing a player's current season wOBA to his current season xwOBA and PHAMwOBA, the correlation increases by 6.2% when using the latter. (Technical note: that is a 6.2 percent of change, not a 0.062 change in r^2 value.) When comparing current season wOBA to the previous year's wOBA estimators, PHAMwOBA holds the advantage by 11.4%. Long story short, my research indicates that PHAMwOBA has done a better job than xwOBA explaining what has happened in the past and (relatively) an even better job predicting what will happen going forward.
I should note that PHAMwOBA's park adjustment actually raises the xwOBA for players at hitter-friendly ballparks as a way to more accurately estimate their observed wOBA. That said, I calibrated both the park and speed adjustments so that the league averages equal an adjustment of 0 to a player's xwOBA. This should allow anybody who is curious enough to insert a batter's PHAMwOBA in lieu of his actual wOBA into the wRC+ formula to compare players on different teams or from different seasons.
Granted, there are a plethora of factors that PHAMwOBA doesn't account for, one being the quality of the opposing defense. I'm not going to pretend that my stat is perfect–or anything close to it. After all, there are professionals in this field with far more experience and access to much more information than I.
That I even have the opportunity to attempt a study like this is a testament to the era we live in.