it don't add up

for a slow time of year, we've got a lot going on. first of all, i posted community projection threads for chris duncan and adam wainwright over on the diary bar. head on over and cast your ballots. second item: cardinal70 has created a google page to track the all-time cardinal tournament; stats, brackets, standings, etc. my thanks to cardinal70 for putting that together; for those of you following the tourney, i'm hoping to write up a couple more series and post 'em this afternoon.

readings: matt leach posted an interesting item on preston wilson; definitely worth your consideration. and over at The Book blog, tangotiger makes some cocktail-napkin calculations to derive an average cost for a family of 4 to attend a game at busch III. the "official" estimate is north of $200; tangotiger thinks a family should comfortably be able to enjoy an evening at the ballpark for about $110. discuss.

so much for the preliminaries. our feature presentation this morning: Baseball Prospectus posted its 2007 depth charts on tuesday. the depth charts are the same thing as the team PECOTA projection i compiled last month; BP guesstimates playing time for each player within his team context, prorates the PECOTA numbers accordingly, derives team totals for runs scored and runs allowed, and projects a pythagorean w-l. because the cardinals are the reigning world champs, their page is available for free; here's the link.

i'll comment on the st louis figures in just a moment. but first, here are the nl central standings as projected by this exercise:

chicago 85 77 -- 828 789
milwaukee 84 78 1 781 748
st louis 81 81 4 730 734
houston 79 83 6 761 783
pittsburgh 77 85 8 751 795
cincinnati 71 91 14 745 841

you'll note that the cardinal pitching staff projects to be the best in the division. in fact, PECOTA thinks the staff will be the 2d-best in the entire national league, behind only the padres (who are forecast to allow 6 fewer runs). you'll also note that the offense projects to be the division's worst --- and, symmetrically enough, PECOTA rates the offense as the nl's 2d worst, ahead only of the soriano-less washington nationals.

i don't know that i'm buying either one of those projections (although some of you may recall that an early CHONE-based simulation had the cards with the fewest runs allowed in baseball), but of the two, the projection of the offense is by far the more suspect. let's begin by comparing the results of my "depth chart" exercise with BP's numbers:

VEB 788 737
BP 730 734

the two projections are nearly identical in the runs-allowed department, but 58 runs apart in the scoring department; clank. how do we explain the discrepancy? i immediately fact-checked, player by player, the avg / obp / slg figures in my spreadsheet against the numbers in BP's depth chart; they checked out. likewise, the ballplayers in BP's scenario are identical to mine, with one exception: tyler greene didn't appear in mine (and his appearance in BP's chart immediately raises a red flag; greene won't sniff the majors this year, not even in september). however, i did have a generic replacement-player line, into which category greene would surely fall. so there's really no difference in terms of data; we're crunching the same numbers for the same players, yet still arriving at dissimilar outcomes.

the next logical step was to check my playing-time assumptions against BP's. here we find considerable divergence. this table presents the plate appearances i assigned to each player in my scenario, vs the PAs assigned by BP, with major differences highlighted:

pujols 659 689 kennedy 479 520 miles 287 375
rolen 563 565 duncan 456 359 taguchi 215 365
edmonds 466 448 molina 429 476 wilson 215 331
eckstein 602 532 spiezio 310 238 j-rod 200 221
en'cion 560 469 bennett 188 158

here's how i'd break these down. the encarnacion and wilson discrepancies cancel each other out. the two players' projected offense is so similar that their at-bats are pretty much interchangeable in this exercise; i've got the two players combining for 775 plate appearances, and BP has them combining for 800. so that's a wash. the taguchi discrepancy, by contrast, appears to be a meaningful one: BP is giving him about 70 extra at-bats that i had assigned to spiezio, and another 70 or so that i had assigned to duncan. in other words, BP sees the inferior hitter taking significant playing time away from two superior hitters. maybe that explains part of the discount. and another part is surely aaron miles' enlarged role per BP; miles has one of the worst PECOTA projections in mlb this year (.237 / .281 / .307). BP has him taking about 70 plate appearances away from david eckstein, who is projected to get only 532 PA --- an unreasonably low figure. eckstein last year had 552 plate appearances --- and he missed six weeks with a strained oblique and other injuries. the year before that he had 699 plate appearances; in 2004, 636 plate appearances. his career low in PA came in 2003, when he only came to the plate 517 times. if eckstein hits my projection of 600 PA (which is still low for him) and the at-bats come out of the anemic miles' budget, that helps the offense.

even though i disagree with BP's playing-time assumptions, i accepted them for the sake of inquiry and plugged those numbers straight in to my spreadsheet. but after adjusting the projections per BP's playing-time assumptions, i still came up with a team total of 780 runs scored --- 50 runs higher than BP's bottom line. the discrepant playing-time assumptions only explain away 8 of the 58 runs by which my bottom line differs from BP's. something's still not adding up.

tenacious seeker of truth that i am, i kept probing. i compared BP's team totals for avg / obp / slg to the totals i derived using BP's playing-time assumptions:

avg obp slg
VEB .264 .337 .419
BP .261 .331 .416

ok . . . . . so now what? i fact-checked BP against itself. i took their team totals and multiplied them by standard 162-game totals of 5550 at-bats / 6200 plate appearances. this yields team totals of 2052 baserunners (ie, 6200 PA x .331 obp) and 2309 total bases (5550 AB x .416 slg). with those numbers, we can make simple estimates of runs created and base runs. the totals:

BP adj 765 772

at this point, i'm at a loss; even assuming a team line of .261 / .331 / .416, i don't see how BP derives its estimate of 730 runs. if anybody can explain it to me, i'm all ears.

i don't want to turn this into a bash-BP thread; they're very smart guys, much smarter'n me. but i'm usually smart enough to follow them; in this case, i can't. based on their own numbers, i think the cardinals' projected runs-scored total should be no worse than 765. let's now plug that back in to the projected standings table that we began with:

chicago 85 77 -- 828 789
milwaukee 84 78 1 781 748
st louis 84 78 1 765 734
houston 79 83 6 761 783
pittsburgh 77 85 8 751 795
cincinnati 71 91 14 745 841

now that projection --- a battle among evenly matched stl, chi, and mil --- i can buy into. that i can believe. if anything, i'm surprised that milwaukee isn't viewed as the favorite. now, things might shake out exactly as BP projects, with the cards finishing at .500 and trailing both chicago and milwaukee. there's a significant chance that could happen. but i don't think, as BP's number-crunching suggests, that the cards are heading into the season with a significantly worse team than chicago.

i'll send an e-mail to nate silver at BP and see if he can explain the numerical discrepancies to me.