In the comments to this post about the results of his/her simulator Xeifrank recently mentioned that due to the rules of baseball, the most likely outcomes of closely matched teams have the home team winning by a single run. In particular, even if the away team is favored, the most likely single scores still involve the home team winning by one run, because the distribution over scores in which the away team wins is more evenly distributed. This is basically a result of the home team not tacking on additional runs in the 9th inning or later when they win (since the game ends). I asked if this trend is visible in real games and not just the simulations, and, on his/her suggestion, decided to take a look for myself. (Spoiler alert: the answer is yes)
I took as my data the scores of all of the games from 2009 (through last night, 9/24) yielding a total of 2288 games. Overall, the home team had a winning percentage of .546, in line with the average over the history of baseball of 54%. To be honest, I wasn't aware prior to this analysis that home field advantage was actually that strong in baseball (even though it is much weaker than most sports, from what I understand). In any event, here's a heatmap of the joint histogram of final scores:
Because the home team already has the slight edge, the results aren't as suprising in this case, but the most common scores all had the home team winning by one run (4-3, 3-2, 5-4). Because the home team wins more games overall, it's a little hard to interpret the above plot in terms of the one-run bias for which we are looking. One way to account for this, is to normalize the home team and away team victories separately. The result is a plot that shows the probability of a given score, assuming that the home team wins (upper left triangle) or that the away team wins (lower right triangle). The resulting plot is:
If there were no bias towards one-run games, the above plot would look symmetric about the diagonal. While the effect isn't particularly strong, there clearly is an asymmetry in the data. Outcomes of 4-3 and 3-2 are more likely for the home team than the away team, meaning that home teams win a higher percentage of games by these scores than away teams do. To offset this, the away team has more density further away from the diagonal. Note, for example, that 3-1, 4-1, and 5-1 victories are enriched in away team victories relative to home team victories. There also seems to be enrichment for away teams blowing out the home team (10+ runs vs 1-6 runs), but that might just be noise.
Finally, I wanted to look at the same data but focusing on the margin of victory instead of the exact score. This plot shows the total winning percentage of home teams given a particular margin of victory (but not conditioned on a home team victory):
This plot really shows the dramatic difference in one run wins. In one-run games, the home team won almost 61% of the time; that's equivalent to a record of 99-63. In contrast, the winning percentage in 2+ run games is closer to 52% (84-78). On average, obviously, this still comes out to the 54% winning percentage that home teams have overall. While I didn't demonstrate it here, this bias towards 1-run home-team victories is likely a result of the rules of baseball, and not some psychological lift that the home team gets in close games. This idea is supported by the fact that it still appears in the simulator, which obviously doesn't have any psychology or anything of the sort built in.
So, what's the point of all of this? As my title alludes to, one way to view this result is as a caution against selection bias. Winning percentage in one run games is often thrown around as some measure of how "clutch" a team is. While I know I'm preaching to the choir, this is just one example of how such discrepancies in results can arise without any human element at all. Next time you hear about how well a team has performed in one run games, I'd at least take a look at how many of those games were won on their home turf.