I know this is going to shock some of you, but I don't like Mike Matheny. Whether you prefer the analytics or the eye test (I urge you not to click on this latter link) it's no secret that the Cardinals skipper is a subpar bullpen manager, to put it lightly. That's where the idea for this article originated from. Matheny is often criticized for going with his gut and falling victim to ridiculously small sample sizes. So the question the piece will attempt to answer is: how much does Mike Matheny really trust the hot hand?
The methodology
To answer this question about the hot hand I needed two pieces of information: the game situations Matheny was inserting his pitchers into and the recent performance of his pitchers leading up to that appearance.
FanGraphs measures the leverage index (LI) of every plate appearance based on the score, inning, and base-out state. A higher LI indicates a situation that has a greater impact on both teams' odds of winning, with 1.0 serving as an "average leverage situation", above 2.0 as high leverage, and below 0.85 as low leverage. More specifically I used gmLI, or the leverage index when a pitcher entered the game. (Otherwise pitchers who got themselves into trouble would have an artificially inflated LI.)
To quantify a reliever's recent performance I chose RE24, which calculates how each game event impacts the run expectancy for that inning. Unlike a metric such as WPA, RE24 treats a leadoff double in the fifth inning the same as a leadoff double in the ninth.
From there I collected the game log data from every Cardinal reliever since Matheny was hired (minimum 20 innings). On my spreadsheet lay 51 individual seasons and nearly 3000 relief appearances to examine. I calculated the rolling RE24 average over the pitcher's previous 1, 3, 5, 7, 10, 15, and 20 games prior to the outing in question. I also made the decision to exclude streaks that spread across multiple seasons, as a manager's assessment of a player generally resets at the beginning of each year.
I then compared the gmLI data with the recent RE24 averages to see how much stock Matheny placed in a reliever's recent success, or lack thereof. To measure this trust I looked at the r and r^2 values (I can already hear the groans of return readers) between the gmLI and RE24 numbers. R^2 tells us how much of a correlation there is among the data, a value of 1 being a 100% correlation; 0 being 0%, while the correlation coefficient (denoted as r) lets us know what that correlation between the two variables is (i.e. a higher x value trends towards a lower y value or vice versa).
The results
Rolling RE24 average from past... | r | r^2 |
1 Game | 0.038 | 0.001 |
3 Games | 0.023 | 0.001 |
5 Games | 0.051 | 0.003 |
7 Games | 0.204 | 0.003 |
10 Games | 0.210 | 0.003 |
15 Games | 0.203 | 0.006 |
20 Games | 0.199 | 0.006 |
This table is mathematical lingo for welp, there is virtually no correlation between recent success and faith from Mike Matheny. The one takeaway from this data would be that, of whatever trend exists, seven games tends to be the benchmark Matheny uses when determining who's hot and who's not. (For whatever it's worth, Tom Tango's The Book: Playing the Percentages in Baseball found a four game hot-streak to be the timeframe that held predictive power for a reliever's next appearance.)
I also sectioned off each appearance into three groups: hot-streaks (the top 10% by rolling RE24 for a specific duration of time), cold streaks (the bottom 10%), and the remaining 80% in a middle column.
Rolling RE24 average from past... | Avg gmLI for top 10% | Avg gmLI for middle 80% | Avg gmLI for bottom 10% | Difference between top and bottom 10% |
1 Game | 1.140 | 1.289 | 1.180 | -0.040 |
3 Games | 1.180 | 1.293 | 1.132 | 0.048 |
5 Games | 1.215 | 1.310 | 1.050 | 0.165 |
7 Games | 1.284 | 1.306 | 1.049 | 0.235 |
10 Games | 1.194 | 1.338 | 1.025 | 0.169 |
15 Games | 1.350 | 1.358 | 1.038 | 0.312 |
20 Games | 1.349 | 1.359 | 1.132 | 0.217 |
Does Matheny really try to "rebuild a pitcher's confidence" after a poor outing by placing him in a bigger spot the next time out? The "1 Game" row says that such a theory is plausible, but let's not discount the fact that every pitcher will suddenly have a bad game.
Once again, it appears that a noticeable gap emerges between the hot and cold pitchers during the 5-7 game timeframe, which research has suggested to carry some merit when it comes time to make bullpen decisions.
A stat like BMAR is likely a better indicator of a manager's tactical ability, in part because it incorporates a reliever's performance over the past calendar year. We know that BMAR loathes Matheny, but as far as knee-jerk decisions based on what occurred last night, let's give Mike some credit here.
As always thanks for reading! You can follow me on Twitter @Tyler_Opinion and check out my YouTube page for Cardinals highlight videos.
Go Cards!