clock menu more-arrow no yes mobile

Filed under:

Sunday ramblings: stats, scouts and sample size

Sorry boys and girls, no Brad Penny post again.  I feel that he really is an interesting case, and one with which Pitch f/x could be particularly useful; however, analyzing Penny has been a lot harder than I thought it would be.  Since I want to do the post justice, I'll wait till I have good pitch classifications and a good idea of what to look for before I present it to you guys.

Instead, today I'd like to spend some time talking about the role of stats, scouts and sample size in evaluating players.  I'll start off by saying that the supposed schism between statheads and scouts does not hold true for a lot (dare I say most) serious sabermetricians.  Stats are simply the record of the results (or in some cases, like UZR and DIPS models, estimates of those results).  If a player has 4 plate appearances in a game and gets out each time, the stats will say he had a bad game. 

Scouting, in theory, is supposed to be about the inputs.  Take that same player who went 0-4.  Say he's facing Chris Carpenter on one of his pulling-a-95-MPH-twoseamer-out-of-his-ass days, and still manages to get solid contact on the ball.  However, he lines out twice to Rasmus in deep center, and gets robbed by Boog two more times on hard ground balls in the hole.  A good scout should be able to recognize that the batter actually did a pretty good job that day and simply got unlucky.  

In small sample sizes, a good scout is ALWAYS better than stats.

However, even good scouts obviously have their flaws.  For one, there is often bias in their opinions of players.  In Moneyball, Billy Beane once said "we're not selling jeans here", or something to that effect.  This was a caution against scouts who favored players who looked good, even if they didn't have good tools.  Scouts are also not trained in bio-mechanics, or other necessary subjects to judging players by their mechanics, so they will have natural errors in judgment as well.  Furthermore, even if a scout is able to give a perfect assessment of the players ability, it's hard to translate that into an expected stat line going forward.

In the short run, those flaws aren't nearly as glaring as the ones in a players stats.  Even over a full season, a player can get unlucky.  That may seem foolish, but consider all of the variables that are out of a hitter's control than can drastically affect their stats:

  • Umpires
  • Opposing pitchers
  • Defense
  • Ballpark

In fact, even over and entire *career*, there is still some expected luck in a player's statline. 

However, while luck always exists in stats, it get's increasingly smaller the more games a hitter plays.  At some point, the measurement error of stats (meaning how much luck goes in them) gets smaller than the biases and judgment errors that scouts have.

There is no real way of telling where that point is; however, you can make some reasonable assumptions.  The best way to evaluate players is ALWAYS to weigh scouting information and stats in some way or another.  To steal Tom Tango's favorite line, Theo Epstein once said that scouting and stats are each one lens of a pair of glasses, and you need both of them to see clearly.

If you combine a team of excellent pure scouts (IE, guys who don't consider the results AT ALL when evaluating players) and a team (or one really) of good statisticians, you will get the best possible evaluations of players out there.

Indeed, some of the most innovative research out their is done by combining stats and scouts.  UZR is a perfect example of that.  It takes human observation of the quality and placement of batted balls and processes that information into an estimate of how many runs that player saved.  Pitch f/x is another example, as, in my opinion, it is essentially digitized scouting.  Ideally, you could construct a DIPS estimator using only the inputs that a pitcher provides (velocity, movement, location), and ignore the actual outcomes.  Pitch f/x is the closest thing we have to an unbiased scout, who's observations can be broken down into numbers and manipulated like such.  

This brings me to another related topic, which is the misuse of stats in the online community.  Although there are a lot of ways in which people misinterpret and misuse stats, the most common denominator in those instances is sample size.  

Consider the example of the player going 0-4.  NEVER, never, never would anyone ever say that statline is A) a decent gauge of how well the player performed, or B) a decent predictor of his ability going forward.  However, we do that all of the time with similarly small sample sizes of stats.  

An example of that is John Smoltz's pitch count splits last year.  From Baseball Reference:

Pitches Plate appearances OPS against
1-25 84 .698
26-50 99 .818
51-75 96 .833
76-100 63 1.011


A clear trend right?  The more pitches he threw, the worse he got, that proves he can't be a starter in the bigs anymore, right?  

Wrong.  Now it's possible that Smoltz legitimately pitched worse as the game went along last year (or at least moreso than the average pitcher).  I haven't examined his Pitch f/x data for those splits yet, but it's entirely possible that he lost velocity on his fastball, his slider had less break and he threw more pitches right down the middle.  However, you couldn't tell that just by looking at the chart above. 

63 plate appearances for the last set of pitches and less than 100 for each of the other ones?  Are you kidding me?  If Chris Carpenter allowed a 1.000 + OPS against over the first 63 batters faced to start the year next year, nobody would (or should) make a big deal out of it.  In fact, we see that last year, Carpenter gave up a .922 OPS against the Giants in 52 plate appearances.  Does that mean that Carpenter can't pitch against the Giants?  No, it means that he had a couple of bad starts against them.  

The attitude, and admittedly strawmen-esque phenomenon I'm referring to above is what's know as false correlation.  When you see Smoltz getting progressively worse the more pitchers he throws, that *implies* that he actually pitched worse (and didn't just get unlucky); but it certainly doesn't prove it, and the standard error on that implication is going to be huge.  

This happens all of the time with stuff like right-lefty splits, home-road splits, random slumps and hot streaks that players go into, etc.

Anyway, it's 3 AM my time right now and I wanna go to sleep.  Hopefully, my arguments (admittedly with no evidence to back them up) were convincing to you and you took away something valuable from them.  If not, enjoy your f'ing sunday.