clock menu more-arrow no yes mobile

Filed under:

When are relievers more predictable?

Early-season breakouts occur like clockwork, but are these pitchers any more likely to keep up their strong starts going forward?

Jeff Curry-USA TODAY Sports

In a recent episode of the Effectively Wild podcast, a listener emailed co-hosts Ben Lindbergh and Jeff Sullivan a question about the modern evolution of constructing a bullpen. The latter’s response included the following remark about the volatility of relief pitcher performance, specifically from the end of one season to the beginning of the next:

I think it’s actually dangerous to invest in non-elite relievers before the season because you just don’t know how much of that [previous season’s production] is going to carry over and then it would make sense to add during the season when I think–I guess if I had to simplify, I think first-to-second half reliever performance is more reliable than season-to-season reliever performance when you’re talking about guys who aren’t the absolute cream of the crop.

You can listen to Jeff’s answer in its entirety by skipping ahead to the 28:12 mark in the above link, but testing his hypothesis serves as the basis of this article.

When the Cardinals inked Pat Neshek to a minor league contract in February of 2014, the righty was coming off a -0.2 fWAR season with the Athletics. That year in St. Louis, Neshek vaulted his way onto the All-Star team as one of the most valuable relievers in all of baseball. Seung-hwan Oh, on the other hand, suffered a polar opposite reversal of fortunes when his sheer dominance in 2016 gave way to measly replacement-level production in the first half of 2017.

Sudden breakout relief performances like Neshek’s and collapses like Oh’s are commonplace in the realm of MLB bullpens. The volatility of relievers is a well-documented phenomenon, one that, admittedly, is exacerbated by inherently small sample sizes of at most 60-80 innings per season. Still, both Neshek and Oh observed such radical shifts in their performance from season-to-season but not from the first to second halves within the same year. Are these two part of an overarching trend or are they merely anecdotal outliers?

Using FanGraphs’ statistical leaderboards dating back to the second half of 2012–giving us a complete five-year sample to work with–I complied data for relievers who recorded a minimum of 20 innings in consecutive season halves (first and second halves of 2015, second half of 2016 and first half of 2017, etc.). To account for players switching teams–and consequentially, home ballparks and possibly leagues–in addition to differing run environments from one half-season to the next, I turned to the pitching metrics FIP- and xFIP- for this study. These stats simply adjust a pitcher’s FIP and xFIP to the run environment they pitched in, with the league average set to 100 and a lower number being better. For example, a FIP- of 90 would be 10% above average.

I then compared relievers’ FIP- and xFIP-, respectively, in the first half of Year X compared to the second half of the same Year X. I juxtaposed that with the stats belonging to pitchers who met the 20 innings requirement in the second half of Year X and the first half of the following Year X+1. Returning to the aforementioned Oh case study, his xFIP- in the second half of 2016 (Year X) was 65 and ballooned to 116 in 2017 (Year X+1), resulting in a change of 51 percentage points.

To determine whether or not midseason relief performance was indeed more predictable as Jeff theorized, I borrowed a few pages from your old statistics textbook, namely, r² value and standard deviation.

The r² value , also known as the coefficient of determination for any of my fellow nerds out there, essentially tells us how closely our data matches the expected results. It can range from 0 to 1, with a 1 indicating perfect, 100% correlation between, in our case, the FIP- or xFIP- from the first half measured to the second half. Meanwhile, standard deviation looks at the amount of variance in the data and how much each individual data point strays from the mean.

Don’t sweat it if you still aren’t entirely clear on r² values and standard deviations. Just know that a higher r² value signifies greater consistency from half-to-half while the opposite is true regarding standard deviation.

When are relievers more predictable?

Metric 1st Half Year X to 2nd Half Year X 2nd Half Year X to 1st Half Year X+1 Percent of Change
Metric 1st Half Year X to 2nd Half Year X 2nd Half Year X to 1st Half Year X+1 Percent of Change
FIP- r² value 0.079 0.046 71.7%
xFIP- r² value 0.217 0.154 40.9%
FIP- standard deviation 19.8 21.1 -6.3%
xFIP- standard deviation 12.2 13.8 -12.1%

None of these numbers point towards overwhelming correlation–you are probably better off consulting a projections system like ZiPS or Steamer to predict future player outcomes–but this table does reaffirm our initial belief. The r² value is higher and the standard deviation is lower from the first half to the second half of Year X than from Year X to Year X+1. The bottom line is that relievers are more consistent between the two halves of the same season than the last half of one season and the first half of the next.

As to why this is, there are likely a multitude of factors at play. Pitchers’ physical ability and stuff can deteriorate as they age from one year to the next. Or perhaps hitters are less adept at making adjustments to a pitcher in the middle of a season. Regardless of your answer to the “Why?”, the “What?” of the matter is rather apparent. The reliever who burst onto the scene that you are considering acquiring at the deadline is more likely to maintain that success than an offseason free agent or trade target.