The clutchiness of the home team
In the comments to this post about the results of his/her simulator Xeifrank recently mentioned that due to the rules of baseball, the most likely outcomes of closely matched teams have the home team winning by a single run. In particular, even if the away team is favored, the most likely single scores still involve the home team winning by one run, because the distribution over scores in which the away team wins is more evenly distributed. This is basically a result of the home team not tacking on additional runs in the 9th inning or later when they win (since the game ends). I asked if this trend is visible in real games and not just the simulations, and, on his/her suggestion, decided to take a look for myself. (Spoiler alert: the answer is yes)
I took as my data the scores of all of the games from 2009 (through last night, 9/24) yielding a total of 2288 games. Overall, the home team had a winning percentage of .546, in line with the average over the history of baseball of 54%. To be honest, I wasn't aware prior to this analysis that home field advantage was actually that strong in baseball (even though it is much weaker than most sports, from what I understand). In any event, here's a heatmap of the joint histogram of final scores:

Because the home team already has the slight edge, the results aren't as suprising in this case, but the most common scores all had the home team winning by one run (4-3, 3-2, 5-4). Because the home team wins more games overall, it's a little hard to interpret the above plot in terms of the one-run bias for which we are looking. One way to account for this, is to normalize the home team and away team victories separately. The result is a plot that shows the probability of a given score, assuming that the home team wins (upper left triangle) or that the away team wins (lower right triangle). The resulting plot is:

If there were no bias towards one-run games, the above plot would look symmetric about the diagonal. While the effect isn't particularly strong, there clearly is an asymmetry in the data. Outcomes of 4-3 and 3-2 are more likely for the home team than the away team, meaning that home teams win a higher percentage of games by these scores than away teams do. To offset this, the away team has more density further away from the diagonal. Note, for example, that 3-1, 4-1, and 5-1 victories are enriched in away team victories relative to home team victories. There also seems to be enrichment for away teams blowing out the home team (10+ runs vs 1-6 runs), but that might just be noise.
Finally, I wanted to look at the same data but focusing on the margin of victory instead of the exact score. This plot shows the total winning percentage of home teams given a particular margin of victory (but not conditioned on a home team victory):
This plot really shows the dramatic difference in one run wins. In one-run games, the home team won almost 61% of the time; that's equivalent to a record of 99-63. In contrast, the winning percentage in 2+ run games is closer to 52% (84-78). On average, obviously, this still comes out to the 54% winning percentage that home teams have overall. While I didn't demonstrate it here, this bias towards 1-run home-team victories is likely a result of the rules of baseball, and not some psychological lift that the home team gets in close games. This idea is supported by the fact that it still appears in the simulator, which obviously doesn't have any psychology or anything of the sort built in.
So, what's the point of all of this? As my title alludes to, one way to view this result is as a caution against selection bias. Winning percentage in one run games is often thrown around as some measure of how "clutch" a team is. While I know I'm preaching to the choir, this is just one example of how such discrepancies in results can arise without any human element at all. Next time you hear about how well a team has performed in one run games, I'd at least take a look at how many of those games were won on their home turf.
22 comments
|
14 recs |
Do you like this story?
Comments
lol...home runs.
I misread as HR’s at first…. til I read the x axis.
Yo MLBPA, I'm really happy for you, and I'mma let you finish, but Albert is the most ridiculous player of all time. OF ALL TIME!
Figure 3 is dramatic indeed
Nice work.
Guys like Bradley are exactly why we can't have a pumpkin patch anymore.
Awesome, awesome stuff
This seems like a great avenue for future study. I think the obvious reason is that the home team has the walk off, which is going to be one-sided in the favor of the home team and usually results in a 1 run victory. However, part of it could also be that closers will play better at home (as does everyone else in baseball) and they’ll have a better save% in 1 run games at home than on the road.
Thanks much
In regards to further analysis, are there public databases for these kinds of stats? I scraped all the data here from cbssportsline, but that’s a little clunky and doesn’t have all the data I’d like. In particular, I was hoping to split it out by extra inning games, but that was gonna take a little more scripting than I felt up to since I couldn’t find a single page with that information on it.
by brackenthebox on Sep 26, 2009 11:41 AM EDT up reply actions
I've got a decent amount of MySQL experience
I’m planning on loading the retrosheet data into a new DB later this week. I know you’ve posted about the pitchfx data previously. Any other great data sources out there?
Thanks.
by brackenthebox on Sep 28, 2009 9:01 AM EDT up reply actions
Well, if you want to have a shortcut to get a Retrosheet database
You can download the entire thing in an SQL dump at this website:
http://www.wantlinux.net/2009/04/retrosheet-baseball-mysql-database-download/
I haven’t tried it yet, because it’s a huge file and would probably kill my computer, but you could give it a shot.
by vivaelpujols on Sep 28, 2009 1:34 PM EDT up reply actions
Also, you say that you are good with SQL?
Do you think you could drop me an email so I could pester you with some questions? I’m having some trouble figuring how to set up a query. My email is listed on my profile.
by vivaelpujols on Sep 28, 2009 2:03 PM EDT up reply actions
I rec'd this
I figured I should in order to keep the rec/comment ratio nice and high.
Great stuff.
Albert Pujols does not have "down" years. He has "~6 WAR" years.
These graphs look like extreme close-ups of Mega Man.
EXTREME CLOSE-UP!!!!!!
I once shot a man just to see him die...then I got distracted and missed it.
GET EQUIPPED WITH
HEAT MAP!
Albert Pujols does not have "down" years. He has "~6 WAR" years.
by mattybobo on Sep 26, 2009 9:39 PM EDT up reply actions 1 recs
one of the best fanposts I've seen
bravo
Positronic Upgraded Juggernaut Optimized for Logical Sabotage
by Cards Fan in Chitown on Sep 26, 2009 6:53 PM EDT reply actions
what happens if you subtract 1-run walkoff victories?
I realise this isn’t 100% fair either, but would be interesting to see.
Felonius Monk - bitching to contact since 2008
Wow
well done…and very interesting data…rec’d
"Albert hits good pitches hard and bad pitches even harder. And when he gets in the batter's box, if you pray, then you start praying. And if you don't pray, you think about starting."--Brian Bannister

by 

















