clock menu more-arrow no yes

Filed under:

Stats You Need to Know: Pitching

Answering a fan-question about statistics. Part 2: Pitching

Wild Card Round - St. Louis Cardinals v Los Angeles Dodgers Photo by Harry How/Getty Images

We’re back today with the second article in our “Stats You Need to Know” series. Last week, I tackled the first part in the series, going through a range of offensive statistics from slash line (BA/OBP/SLUG) to weighted stats (wOBA and WRC+) and expected stats. You can find that article here.

Today, we’re moving on to the other side of the ball, where we’ll try to cover pitching.

Here are 4 types of pitching stats you need to know:

1. Counting Stats – The Value of Starter Wins

Let’s just jump right into the heart of a long-standing debate about pitcher performance: wins.

It was 1993. I remember walking down the first-floor corridor of Glendale High School having a heated debate with one of my friends about who should win the AL Cy Young. The odds on favorite was Jack McDowell with the White Sox. McDowell had an ERA (earned run average) of 3.37 that season, and, most importantly, 22 wins in 34 starts.

His competition for the award was Randy Johnson, who had struck out 308 batters in just 255 innings and had a better ERA than McDowell. Johnson, though, only won 19 games.

I bet you can guess who won. 22 wins are better than 19 wins, so McDowell was the Cy Young winner.

Back then, it didn’t matter what other stats a pitcher had. Wins are wins. And winning pitchers win. So the pitcher with the most wins wins.

Pitcher Wins – (From MLB) – “A pitcher receives a win when he is the pitcher of record when his team takes the lead for good – with a couple rare exceptions. First, a starting pitcher must pitch at least five innings…to qualify for the win. If he does not, the official scorer awards the win to the most effective relief pitcher.”

I know there are old-school baseball fans who want to hold on to wins as a vital stat for starters, but it is time to let it go.

Simply put, a win does not tell us anything about how well a starter performed.

Starter wins tell us more about the quality of the team the pitcher played on. A worse starter on a good team will win as many games or more than a better starter on a bad team. Why? Because a starter can only somewhat control the runs he allows while he is in the game. He has no control over the runs scored by the offense, given up by the defense, or allowed by the relief corps.

We would classify wins as a counting stat – a stat that accumulates as they are earned. Simple standard and common stats would be strikeout totals (K’s), walk totals (BB’s), intentional walks (IBB’s), HR’s allowed (HR’s), etc.

2. Rate Stats: % and /9

These kinds of counting stats are rather useless when it comes to pitchers because they can vary greatly based on the number of innings a pitcher pitches. Back in Randy Johnson’s day – the late ‘90s and ‘00s – pitchers were routinely throwing over 200 innings and that allowed them to rack up huge counting stat totals, like Johnson’s impressive 364 K total in 1999. Is that a lot of K’s? Absolutely. But it did take him 271 innings to earn that many K’s.

What matters more than the total count is the rate or frequency with which Johnson generated a strikeout.

Rate stats neutralize the impact of innings on counting stats by re-calculating them by percentage and setting that percentage on a familiar scale – like 9 innings.

K% – the number of strikeouts a pitcher generates per total batter faced. K/TBF.

K/9 – the number of strikeouts a pitcher records multiplied by 9 innings per game and divided by the number of innings pitched. K/9 = K*9 / IP.

BB% – the number of walks (not counting intentional walks) a pitcher generates per total batter faced. BB/TBF

BB/9 – the number of walks a pitcher records multiplied by 9 innings per game and divided by the number of innings pitched. BB/9 = B*9 / IP.

Both of these types of stats are useful for knowing the frequency or rate at which a pitcher generates a K or a walk. The same thing works with other counting stats, like HR’s.

I personally find /9 (per 9) stats useful for articles. If I tell you that Jack Flaherty strikes out, on average, 10 batters every 9 innings, I think that’s easier for readers to understand than saying he K’s 28.7% of the batters he faces.

It’s pretty much the same thing. But it just reads differently. I try to cite both when I can.

Fangraphs has a nice little chart that helps us understand what percentages and per 9’s are good, bad, great, or terrible:

3. Runs: ERA vs. FIP (and xERA vs. xFIP)

Speaking of rate and per 9 stats, that’s what we have with this next set: ERA and FIP.

ERA – Earned Run Average – The average number of earned runs a pitcher allows per 9 innings. (ER*9)/IP

The goal of ERA is admirable. It is a simple statistic that is trying to tell us how many runs a pitcher allows that is his fault. It does this by removing unearned runs.

The problem with ERA is that, while its intentions are good, it doesn’t execute those intentions well. ERA assumes that every run scored that did not derive from a fielding error is the pitcher’s fault. That’s not always the case, because error rulings are made subjectively and there are pages of unwritten and arbitrarily assigned guidelines about how errors should be determined.

Here’s what I’ll say: ERA is useful in knowing what actually happened – how many non-error runs a pitcher gives up per 9 innings pitched.

If you want to know how many runs a pitcher deserved to give up on a 9-inning rate, that’s where we turn to FIP.

FIP starts with the provable assumption that pitchers don’t have all that much control over what happens with a ball that’s put into play (i.e. not a K, BB, HBP, or HR). We know this from studies of BABIP – batting average on balls in play. Given enough time and sample size, BABIP normalizes – returns to average – for both batters and pitchers. There’s just a certain percentage of balls hit into play that find gaps in the field regardless of what the pitcher does. (Most of the time)

Variance can happen in BABIP but it tends to come from elements that are outside of a pitcher’s direct control. Like the quality of the defense. Or the way the ballpark plays.

If you don’t believe me, follow this mental exercise: Who will have the higher batting average on balls that are in play and, therefore, the higher ERA?

Situation One: Adam Wainwright pitching in Busch Stadium with Nolan Arenado, Ozzie Smith, Jim Edmonds, Yadier Molina, and Kolten Wong playing defense behind him.

Situation Two: Adam Wainwright pitching in Coors Field with Matt Carpenter, Jhonny Peralta, Dexter Fowler, Eli Marrero, and Daniel Descalso playing defense behind him.

The answer is obvious. The exact same pitcher will have a wildly different ERA based on the playing environment and the quality of the defense.

ERA simply can’t account for that. FIP – Fielding Independent Pitching – tries to eliminate the impact of defense by measuring only the elements of baseball that a pitcher can control.

FIP – Fielding Independent Pitching – an estimate of a pitcher’s ability to prevent runs based on K’s, walks plus hit by pitches, and HRs, divided by innings pitched and multiplied by a constant.

Over time, FIP and ERA will probably end up close to one another. Where they disagree, the differences are best attributed to luck or the performance of a team’s defense and ballpark rather than the performance of the pitcher. There might be exceptions to this rule (like extreme ground ball pitchers), but generally speaking, it holds up.

Last season, the Cardinals had an elite defense and one of the most pitcher-friendly ballparks in the game. It should be no surprise that all but two Cardinals starters had an actual runs allowed – ERA – that was quite a bit lower than the runs they might have deserved pitching in a more neutral environment – FIP.

The point? As I said last week, you can’t just look at one stat. ERA is useful for some things. FIP is useful for others. The difference between those two numbers matters as it starts to get to the heart of why a pitcher under- or over-performed and if that performance will continue.

Take the Cardinals’ starters. Are they likely to continue to have an ERA lower than their FIP? Did their ballpark change? Did the defense? No. Ok, then. Expect their collective ERA to outperform their FIP next season, too.

4. Reliever stats: Saves and WPA

Starters aren’t the only type of pitcher. Relievers get to throw the ball, too, and their usage has increased significantly over the last few decades. With the rise of the relief pitcher comes the rise in relief pitcher stats.

Relievers collect the same types of counting and rate stats as starters. They generate BB’s, K’s, HR’s, runs allowed, and unearned runs. We can translate those into the same stats as above: /9 stats, % stats, ERA, and FIP.

Relievers, though, experience quite a bit more volatility in some of these stats since they are used differently than starters and throw much fewer innings. This is simple “sample size” variance. The smaller the sample – like the number of innings pitched – the more likely variation in performance will impact a statistic.

Generally speaking, while we can use the same types of stats for a starter and a reliever – ERA and FIP – it’s smart to hold pretty loosely to these stats for a reliever.

For relievers, fans often look at leverage statistics – how well a reliever performs in clutch or critical situations. As with all things statistical, there are simple and complex ways to consider this question.

Saves – SV – A save is awarded to the relief pitcher who finishes a game when a team is winning by no more than three runs and pitching at least 1 inning in the final inning of the game, or by entering the game with the tying run on-deck, at the plate, or on the bases.

Saves are the equivalent of wins for a reliever. It does not in any way tell you that a pitcher “saved” the game for the winning team. All it tells you is that the pitcher was on the mound when the game ended under a specific set of circumstances.

Saves are rooted in the assumption that the highest leverage moment of a game is the ninth inning. Therefore the “closer” should get credit for getting through that moment and ending the game with the win.

The problem is that’s not always true. A “closer” can earn a save by getting the 7-8-9 batters out to preserve a 4-1 win. Not very impressive.

In the inning before, a different reliever might have faced the 4-5-6 batters out to hold that lead and might have had runners on when they did it.

Which situation had a higher impact on “saving” the win?

It was the earlier pitcher who faced a higher-leverage situation than the closer. The closer, though, will get the “save”, the All-Star appearance, and the money.

Is there a better way to measure leverage and clutchness for a reliever? Of course, there is.

WPA – win probability added – is a stat that tries to account for the actual win percentage added by a player’s performance. WPA isn’t limited to pitchers. But since reliever stats are so volatile and “saves” are so pointless, WPA is particularly useful in telling us how well a reliever performed in situations that had a significant impact on winning or losing.

WPAWin Probability Added (WPA) captures the change in win expectancy from one plate appearance to the next and credits or debits the player based on how much their action increased their team’s odds of winning.

This chart illustrates two things. First, it shows the volatility of relievers. Their ERAs and FIPs are all over the place. Second, it shows how little correlation there is between saves and actual high-leverage situations. Alex Reyes earned 29 saves for the Cardinals. Among the Cards relievers listed, he had the lowest WPA. Not only did he blow saves, but he also just didn’t pitch in situations that impacted the outcome of the game as often as, say, Giovanny Gallegos. Meanwhile, TJ McFarland – who was only rarely used for save chances – was able to rack up a high WPA despite a low number of innings because Mike Shildt consistently went to him when the game was on the line.

Is that predictive? Not even remotely. Look at McFarland’s K rates and FIP. He couldn’t generate Ks and his ERA and FIP are way off. Did McFarland earn the trust that Shildt showed him? Not really.

Add it up and we can evaluate McFarland. Using a variety of stats, we now know that McFarland saw a lot of high leverage situations despite a low number of innings and he relied on defense, ballpark, and a healthy dose of luck to get through those. It would be a bad idea to assume that will happen again.

Conclusions

As with hitting stats, we can’t rely on just one statistic to do everything for us. When evaluating pitchers – starters or relievers – we need a combination of rate stats and run stats, and maybe some leverage stats for good measure. All of that, though, just scratches the surface. There’s a whole group of extremely useful Statcast stats for pitchers that really put their velocity and “stuff” into context. We’ll get into those in a later article.

Where do you find these stats? Most of what you’ll need is available for free at Fangraphs.com or baseballreference.com. For more complex pitching stuff, try Baseball Savant - baseballsavant.mlb.com.