clock menu more-arrow no yes

Filed under:

Introducing a New Statistic for Strikeouts and Walks

New, 67 comments

A quantitative approach to valuing the non-contact portion of batting lines

Photo by Jennifer Stewart/Getty Images

I’m going to level with you. My role on this site is to write Cardinals analysis. It says so right there on the header. To the extent that I have a specialty, it’s statistical analysis. The point is, I spend a lot of time thinking about baseball as numbers. I have a confession to make though. Until recently, I didn’t really have a good way to think about strikeout and walk percentages. I’m not saying I didn’t have ANY way- I understand that striking out a lot is bad and that walking a lot is good. What I didn’t have, though, was any quantitative idea of the tradeoff between walks and strikeouts. How many articles about plate discipline have you seen recently? How about articles about someone who has started striking out a lot? Walking more? They’re very popular. The thing is, though, I read those articles and don’t actually know what’s good. I don’t know how to compare the two rates.

Allow me to elaborate. Let’s say you’ve got two players and you’re trying to decide who’s better. One of them strikes out 15% of the time and walks 10% of the time. Pretty good! Now, compare him with someone who strikes out 22% of the time but walks 15% of the time. It’s not obvious, right? How about someone who strikes out 5% of the time but never walks? People have attempted to come up with stats that combine the two. There’s K%-BB%. There’s BB/K ratio. You can do a lot of things with the numbers, but I just don’t have the intuition to know what’s good. A 1:2 BB/K ratio sounds bad, for example, but how different is it if those are 10%/20% rather than 5%/10%? A K-BB of 5% sounds fine, but how different is that at 20%/15% vs. 5%/0%? What follows is my attempt to come up with a more universally useful number.

First, a quick discussion of building blocks. To figure out how valuable a player’s non-contact outcomes are, we need a method for valuing all outcomes at the plate. I’m using wOBA for this. You can find a more complete definition of wOBA here, but I’ll give a quick overview of it. wOBA figures out, on average, how much each possible outcome at the plate affects how many runs you’re likely to score in an inning. Then it assigns that many runs to the outcome. This gives you an easy way to equate many different results, and if you add up all the outcomes a player has and divide by the number of plate appearances, you get wOBA. Like OPS, it’s an attempt at an all-in-one batting statistic, but it has more sensible relationships between results. It’s going to be in the background of my statistic, so I’ll reference it intermittently.

In this next section, I’m going to use Jose Martinez’s stats as an example while I walk through my methodology. The first thing I did was find two numbers- the league average wOBA, and the average wOBA on contact. For the record, they are .312 and .364 respectively this year through Monday’s games. Next, I took every player’s K%, BB%, and HBP%. For Martinez, these are 14.7%, 9.4%, and 0.6%. Using these, I worked out each player’s wOBA on non-contact results, as well as the percentage of plate appearances that ended without the ball being put in play. Again looking at Martinez, his plate appearances end on one of these results 24.7% of the time, and he runs a wOBA of .279 in those PA’s. Set the wOBA number aside for now- it’ll be important later. The only thing that’s left is the balls in play, which I’ve been purposefully ignoring until now.

Let’s add the balls in play back in. So far, we know what percentage of PA’s end without a ball in play (and by subtracting from one, the percentage that end with a ball in play). We also know how well each player does on the ones that end without contact. Finally, we know the league average wOBA for overall batting lines. The next thing I’m going to do is a little mathematical trick. With what we have so far, we can figure out how well a player needs to hit on balls in play to end up with a league-average wOBA overall. In Martinez’s case, he has a .279 wOBA on the 24.7% of at-bats that end without balls in play. We know that a league-average wOBA is .312. Therefore, he would need a .322 wOBA to end up league average. The math looks like this:

24.7% * .279 + 75.3% * .322 = .312

Why am I looking for how well a hitter needs to do to end up with a league-average line? Well, we know what the league average overall line is, and we know what the league average is for balls in play (.364). A player who is exactly average in every way would hit for a .364 wOBA on balls in play and end up with an average overall wOBA of .312. We know from above, however, that Jose Martinez only needs to get to .322 wOBA to be league average overall. Thus, he’s doing better at the non-contact part of the game than the average player. How much better? Well, he can hit 82% (.322/.364) as well as the average hitter and still get by. I want this to be a + stat (where higher numbers are better), so let’s instead calculate it like this: 2-(.322/.364), or 1.12. Then we’ll multiply by 100. Thus, Jose Martinez has a NOC+ of 112. He can hit 12% worse on balls in play than ML average and still be average overall. Put a different way, his non-contact management is 12% better than league average. Why NOC+? Well, it stands for NOn-Contact management. It’s also a fun word to say out loud. I’m open to other name suggestions, though, as I don’t think this one is great and having a good stat name is half the battle sometimes.

So, that’s it. That’s my new stat. I’m going to list how all the Cardinals are doing in it so far this year, and then after the table I’ll have some methodology and next steps discussions. I’d absolutely love any feedback people have, but I understand that not everyone wants to dive so deeply into the nitty-gritty. Here it is- your 2018 Cardinals by NOC+. Note that Kolten Wong is helped out by his incredible ability to be hit by pitches- 12 already this year. Also, if you’re interested in taking a look at the majors as a whole, there’s a link to every batter with 100 PA or more in a spreadsheet here. I’ll be updating it intermittently.

Cardinals Plate Discipline

Player K% BB% NOC+
Player K% BB% NOC+
Kolten Wong 17.50% 7.50% 112.4
Jose Martinez 14.70% 9.40% 111.5
Greg Garcia 17.80% 11.00% 109.0
Yadier Molina 13.60% 5.20% 108.4
Dexter Fowler 21.60% 11.60% 106.6
Matt Carpenter 25.20% 14.30% 103.7
Marcell Ozuna 19.40% 6.10% 100.9
Tommy Pham 25.20% 11.10% 99.2
Jedd Gyorko 24.50% 6.90% 95.0
Paul DeJong 29.40% 9.40% 93.6
Harrison Bader 28.00% 7.70% 91.7
Yairo Munoz 25.90% 6.50% 91.5
Francisco Pena 28.70% 3.20% 84.3
Tyler O'Neill 42.20% 2.20% 55.5

Methods, Concerns, and Next Steps

One convenient thing about benchmarking to a league average line is that park factors on balls in play don’t matter. I’m not using any batted ball data for any player, so you can consider every player as playing in a hypothetical neutral park. In the example above, you can see that it doesn’t matter where Martinez plays. We’re just working out how well he’d need to do on a generic sample of balls in play to meet the league average, which is by definition accrued in all stadiums somewhat equally. There are, however, still issues I’d like to work on with park factors. First, consider two hypothetical stadiums, one that allows a .500 wOBA on balls in play and one that allows a .200 wOBA on balls in play. Even if two players manage their non-contact results equally, they could play very differently in these two stadiums. As an example, a 23% K-rate and 10% BB+HBP rate works out to a 100 NOC+. So does a 40% strikeout rate and 29% walk rate. Those two players would play extremely differently at the two fields, though. You’d vastly prefer the first guy at the high-offense park, and vice versa. Secondarily, parks also have strikeout and walk park factors. Some of this is behavioral- pitchers walk more batters and strike less out at Coors, partially because throwing in the strike zone there is dangerous. Some of it is park-related- Comerica Park in Detroit consistently rates as a good park for hitters when it comes to walks and strikeouts, and its batter’s eye probably helps out. Park factors are hard enough working with a full sample of all batted ball results, so narrowing it to look at these effects only will be tricky but probably worth it.

As I’ve currently built this, I’m using only this year’s wOBA. It’s not constant over time, though. Neither is wOBA on balls in play, obviously. To build a robust database for this stat, I’ll need to do a lot more work on building up a historical database to pull from. It’s also not anything like real-time right now, as it’s just run in a Google spreadsheet. I plan on doing that work, but I’d like to get a handle on how to better handle park factors first.

Lastly, I want to do some work on predictive nature and stabilization. I think that NOC+ has the chance to be a really useful stat, as it bundles up a pretty complex concept into one number. That said, I’d like to learn how stable this number is over time. Are players who are good contact managers one year generally good contact managers the next year? Can players remain equally good at this skill but change the frequency with which they put the ball in play? If it’s only a flash in the pan, how useful can the stat really be? I’m going to run some tests of significance and year-over-year correlation to see. There’s also the matter of whether the skill of non-contact management is correlated to other skills. The short answer is, I have no idea! It very well could be. I’m going to do some work on that as well.

Oh, one last thing. The name really is not great. I was going for something that’s easy to remember, but I know it’s pretty blah. If you have anything that sounds better, absolutely let me know. I like having a + in it to convey that it’s a relative stat, but I’m very open aside from that.