clock menu more-arrow no yes

Filed under:

Check Your Priors: A Study of Certainty

New, 76 comments

I made an unthinking baseball comment to a friend. Then I decided to fact-check myself.

Photo by Dilip Vishwanat/Getty Images

I should tell you right at the start that this isn’t an article about the Cardinals. It’s an article about baseball, to be sure. It’s an article about fandom, arguably. It’s definitely an article about how our brains sometimes see patterns that don’t exist. Consider yourself warned. This all started, oddly enough, at a Mets-Nationals game I went to last weekend. A good friend of mine, a groomsman from my wedding, has the misfortune of being a Nats fan; my wife is a Mets fan. Everything lined up to spend a glorious summer afternoon watching baseball. Even the preposterous concept of ‘Jay Bruce Bobblehead Day’ and the fact that we were forced to find a place to store the unwanted giveaway couldn’t put a damper on the day. My friend made an offhand comment as we watched Zack Wheeler give the Nats the business en route to an easy 3-0 win, and that comment really stuck with me. He mentioned quite casually that he could tell by July that it just wasn’t the Nats’ year, that they just didn’t have the stuff to win.

Now, I don’t remember exactly what I said back to him. We were more than a few beers deep at this point- Citi Field has an absolutely wonderful brewery just outside that was a key part of our trip. I feel confident that it was some generic comment about Pythagorean record, BaseRuns record, or projections. Something to the effect of ‘no, the Nats are still pretty good, they’ve just been unlucky.’ We moved on; we went to Flushing after the game and had hot pot for dinner. Life continued; everyone had a good time. The next morning, however, I was left with the nagging feeling that I hadn’t given his comment a fair shake. I heard that he thought he knew, that he could tell by watching, and I just turned my brain off and spouted off the generic stathead argument. Do you go to Fangraphs at all?

If there’s one thing I hate in life, I guess it’s probably Nazis. If there are TWO things, though, the second one is a stubborn refusal to examine the facts, a hidebound belief in whatever you brought into the argument in the first place. This isn’t unique to baseball, but I do think that it’s a particularly strong impulse in sports. Sports engages the tribal part of our brain, and it’s heavy on authority figures who claim special knowledge of their domain, cowing those who argue with them into submission. It might sound like I’m referring to managers and former players here, and the fans who parrot their tired cliches about the right way to play the game. Honestly, I am talking about them. I’m also referring to sabermetricians, though. I’m not talking about the good ones. Mitchel Lichtman shows his work. Tom Tango, if anything, explains in too much depth, such is his passion about the subject. They’re trying to learn new things, and are open to the possibility of being wrong or reaching the wrong conclusion. No, I’m talking about the armchair statheads, the people who reach for a few stats and use them to hammer down any argument they don’t agree with. It’s almost just a matter of being cool. Someone says a hitter is clutch? Bam, take that! You knew the game was over at such and such point? No it wasn’t, dummy, take a look at this win probability graph. Think the Cardinals have a shot at the playoffs? Feast your eyes on the playoff odds. I’m talking about that kind of thing. I had basically done that to my friend. I’d become what I can’t stand. Here he was, expressing a view he had come to after some consideration, and I swatted it down without so much as engaging.

I resolved to analyze the question of whether it really just isn’t the Nats’ year with an open-minded curiosity and far too much data, the way I like to analyze most things. First, I had to define the problem. I decided to check off a very broad definition right away. I split the standings for each of the last ten years (2008-2017) in half at July 1st. I looked up runs scored and allowed in each half as well. From there, I compared how much a team outperformed or underperformed their Pythagorean expectation in the first half to how they did in the second half. A quick definition: Pythagorean expectation is an estimate of how many games a team should be expected to win based on the runs it has scored and the runs it has allowed. You can find the exact formula here. I thought this would be a good first-pass test. If you can really tell that a team is going to have a bad year, even when their underlying statistics suggest they should be doing better, underperforming in the first half and the second half should be correlated. This makes some logical sense- if your belief is that your team is just innately worse at winning than their run scoring and prevention would imply, they should probably continue their ineptitude. If there’s a team of destiny with a special formula that makes them win totally out of proportion with their run differential, why wouldn’t that keep on going? I can resolve this one with a graph:

That’s basically the textbook definition of random noise. The R^2 of that relationship, the amount of variation of 2nd half outperformance that can be explained by 1st half outperformance, is .0009. In other words, 99.91% of the variance in how a team performs relative to its run differential in the second half is explained by something other than the first half. I’m pretty comfortable saying there’s no correlation.

Maybe, I thought, I’m being too broad in applying my test. No one would say the 2017 Dodgers didn’t have it, for example, even though they underperformed Pythagorean expectation by 2.5 games in the first half. They were 55-28, for Pete’s sake. Including them in the sample might just be obscuring some real signal. The 2017 White Sox underperformed by even more, 3.5 games, but they were awful, 10 games under .500 by July 1. They probably shouldn’t be included in the sample either. With this in mind, I narrowed my focus. What about teams that were within six games of .500 and underperformed expectations by at least 1.5 wins? How did that group do relative to underlying performance in the second half? Awkwardly, they did just fine. The 16 teams that fit that criteria beat expectations by an average of half a win in the second half. Finding a signal was seeming increasingly unlikely, but I persisted. What about teams between 4 under .500 and 8 over .500, but still underperforming Pythagorean? Again, no dice. Those 29 teams were dead on expectations in the second half despite being an average of three games below expectations in the first half.

Maybe, I thought, second half relative performance isn’t what I should be looking for. Maybe I should be looking for absolute performance. This was no good either. The 16-team sample improved from 40.5 first-half wins to 42, while the 29-team sample improved from 43 to 44. At this point, I was ready to give up looking. Whatever effect felt like it was there, it clearly wasn’t showing up in the data.

I did, however, decide to try one last thing. Armed with a lovely set of data, I thought I might as well see how a team’s first half Pythagorean record predicted their second half actual record. After all, people use that to describe how well a team “should” be doing. It should theoretically do an okay job of guessing how well they end up doing. This looked more promising:

That looks way better. It seems like there’s something there for sure. More first-half expected wins leads to more second-half actual wins. I have some bad news, though. You know what else does pretty well at predicting second-half wins? First-half wins:

Well, that’s honestly quite annoying. I can’t really tell the difference. To really underscore how maddening that is, both sets of data have an R^2 of .261. It’s a complete dead heat. If you want to know how well a team might do in the second half, you could do some calculations based on how many runs they allowed and how many runs they scored, throw in some exponents and parentheses and formulas and look like a real mathematician. You could also, just as successfully, count how many games they won in the first half.

One quick aside: I didn’t have the data for some of the more granular winning percentage predictors like BaseRuns or Third-Order Winning Percentage. Maybe that’s a study for another time, but I assume they’re probably a little bit better than Pythagorean Expectation without being truly satisfying. Fangraphs has occasionally claimed that their projections are the best forecasters, but they aren’t easily accessible, so that will have to remain unstudied for the moment.

What happened, at the end of all these graphs? Well, my friend’s belief that the Nationals were doomed to have a bad season doesn’t seem to play out in the data. There doesn’t appear to be anything phenomenally interesting about underperforming your run differential in the first half of the season. In the course of this silly investigation, though, I found something potentially more interesting. As it turns out, Pythagorean Expectation isn’t a particularly good way to think about how well a team “should” do either. It’s a nice tool, a way to feel smart and say something a little less obvious than ‘this team has won 50 games, they are the kind of team that has won 50 games so far’. It’s also not particularly more important to a team’s future performance than their winning percentage. In the end, the most accurate predictor of the Nats’ future wasn’t their lack of the clutch gene, or the blown late leads. It was the fact that they came into July 1st 42-40. The next time you feel the urge to throw out an impressive-sounding baseball statistic to dismiss a friend’s argument, I hope you remember me. I tried that. You might consider me a professional, even. And I was wrong. At least I got an article out of it.