Viva El Birdos: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
New Blog: Once A Metro covering Red Bull New York!

Another look at UZR

I've had a fairly specific concern about UZR/150 for a little while, and think this is as good a place as any to bring it.  I know there has been some heated discussion about the metric's ability to accurately measure individual value.  My question/concern has more to do with the metric at a team level.  I'm hoping people can look at my concern and let me know their thoughts.  Is my interpretation right?  Are my methods of measuring reasonable?  I'm not a statistical expert, so I'll have pretty thick skin about it.  I'm posting this more to learn myself than to educate anyone else.  

 

Anyway, my general view of UZR/150 is that it is a step in a positive direction but still a long ways from what we have when we quantify offense and pitching.  I accept that you need a very large sample to measure at an individual level.  At a team level, I would think that you could be able to interpret data sooner.  More specifically, I wanted to look at how the difference between a teams FIP and ERA correlates with the teams UZR/150.  Presumably, all other things being equal, a team that plays good defense will have an ERA that is better than their FIP and those that play poor defense will have an ERA worse than FIP.  With over half the season behind us, I expected to see a pretty tight correlation between where a team ranked in UZR/150 and where they rank in differential between ERA and FIP.  I'd say there is a correlation, but I was surprised to see how weak it is. 

 

Team  UZR Rank E-F Rank Difference
Angels 11 22 11
Astros 15 13 2
Athletics 12 20 8
Blue Jays 17 14 3
Braves 27 19 8
Brewers 7 10 3
Cardinals 13 7 6
Cubs 14 2 12
Diamondbacks 10 27 17
Dodgers 16 5 11
Giants 1 8 7
Indians 24 30 6
Mariners 4 1 3
Marlins 21 23 2
Mets 30 16 14
Nationals 29 29 0
Orioles 19 21 2
Padres 23 25 2
Phillies 9 6 3
Pirates 3 12 9
Rangers 8 3 5
Rays 2 15 13
Red Sox 25 24 1
Reds 6 4 2
Rockies 22 28 6
Royals 28 26 2
Tigers 5 9 4
Twins 26 18 8
White Sox 18 17 1
Yankees 20 11 9
average  6.00
Stan Dev 4.448944

 

***The standard deviation is based off the difference using absolute difference rather than tracking both negative and positive variances.  I think that is the "right way" to do it, but not sure.  I think the key really is to make sure you keep comparisons between standard deviations apples to apples, so that's what I'll do here.  I'll compare the variability with other measurements using absolute difference.***

I then wondered if I really need to be looking at the connection between UZR/150 and a pitching staff's BABIP.   This link is even a little bit weaker, with an average difference at 6.87 and a standard deviation at 4.58.

I would not be considered a great advocate of UZR, but even so, these connections are much looser than I expected them to be.  Translating it into offensive terms, these items have about the same connection that batting average has to a team's ability to score runs (Average difference 6.47, standard deviation - 3.80).  For another point of reference, Woba ties much closer to ability to score runs (at avg difference of 2.2 and standard deviation of 1.88).

Most surprising, though, was how much tighter the measurements of luck (LOB %, BABIP) tied to ERA minus FIP.  The luck factors correlate about  twice as strongly as the defense behind the pitching (as measured by UZR/150).  LOB % has an average difference of 3.33, with a standard deviation of 2.75.  BABIP has a 2.73 average difference and a 2.24 standard deviation.

This really wasn't what I expected to see.  I've sat on this for a couple weeks, deciding whether to bring it here.  I'm hoping it will either draw some good discussion, or point me to factors that I should (or shouldn't) be looking at.   Maybe they are obvious to all but me....I've just started trying to substantiate my general position on the metric.   Maybe this is as much a conversation on FIP as it is UZR.  Not sure.  I just found this all interesting and a little surprising.  We probably have the resources here to make sense of what I am unable to, and wanted to give it a shot. 

 


0 recs  |  Comment 24 comments

Story-email Email Printer Print

Comments

Display:

I'm probably the last guy...

that should post on a heavy sabr topic, but I wonder how much a teams HR/FB% and GB% effect these numbers. These numbers have a strong tie to FIP if I’m not mistaken.

"Don't do anything till I get back!" - Jesus to the Cubs

by cardzfanbub on Jul 21, 2009 6:18 PM EDT reply actions   0 recs

I would think that the

“luck” factor being included might help to get a higher correlation.

Also, for Merry, what are the p-values when you regress this data?

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 21, 2009 8:20 PM EDT up reply actions   0 recs

I'm wading into way too deep of water here

My limited knowledge of P level comes from statistics classes about 15 years ago. My understanding of it is that it is a measure of testing the significance of the sample. And that’s about where it ends for me.

I don’t think this would necessarily be considered statistically significant, but I would think a half season nears a threshold of relevance. That is completely intuitive though, and I don’t know how to measure it. This is part of why I was reluctant to bring it here…I’m not going to be the guy to answer most follow ups. I decided to mention it anyway because the team rankings did not hover near as close to each other as I would have anticipated. I have not looked at past years yet, but I had intended to. It looks like VEP is already working on much of this, though, thankfully.

What I was really wondering was whether anyone is as surprised as I am with the extent of separation. I would not expect the 2 values to be completely in sync with each other, but I would not expect there to be this amount of variability either.

by Merry CRasmus on Jul 21, 2009 11:02 PM EDT up reply actions   0 recs

If you did the work, you should post it.
This is part of why I was reluctant to bring it here…I’m not going to be the guy to answer most follow ups.

There are plenty of us here who find stuff like this fascinating, and we’ll always be able to help shore whatever questions or inaccuracies that we see. Better to post it and learn if you’ve made a mistake while the rest of us learn about your topic than to keep it to yourself, imho.

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 22, 2009 9:53 AM EDT up reply actions   0 recs

I would think after half a season of games

a team’s HR/FB% should pretty much have stabilised at its “true” level.

Felonius Monk - bitching to contact since 2008

by Felonius_Monk on Jul 22, 2009 5:21 AM EDT up reply actions   0 recs

I agree...

but HR’s have no effect on UZR for obvious reasons, and a great impact on ERA-FIP. A team that gives up a good number of HR’s is likely to see a large difference between these two categories.

"Don't do anything till I get back!" - Jesus to the Cubs

by cardzfanbub on Jul 22, 2009 9:05 AM EDT up reply actions   0 recs

The best thing to do

Would be to run a regression between tRA (which takes batted ball factors into account) and TIPS (timing independent pitching). Of course I just made up TIPS, but it would essentially be a component ERA based off of actual events (doubles, triples, homers). That would effectively take out the timing aspect of UZR.

Derosa.

by vivaelpujols on Jul 21, 2009 6:42 PM EDT reply actions   0 recs

Good idea
Would be to run a regression between tRA (which takes batted ball factors into account

Would BIS adding a batted ball timer, like was reported in your linked post earlier today in the main thread give you similar data to what your TIPS stat would, only through UZR?

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 21, 2009 8:24 PM EDT up reply actions   0 recs

I'm not sure what your asking

TIPS would just be a timing independent run estimator, used solely to compare with tRA to test UZR. I’m not sure how hang time would be factored into it.

Derosa.

by vivaelpujols on Jul 21, 2009 9:29 PM EDT up reply actions   0 recs

Never mind

I was thinking about something completely different, my bad.

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 22, 2009 9:53 AM EDT up reply actions   0 recs

Just for anyone who cares...

Here is the regression line for UZR and ERA-FIP this year:

Obviously that’s pretty bad, however, Jack Moore looked at it last year, and found it was higher:

Derosa.

by vivaelpujols on Jul 21, 2009 7:06 PM EDT reply actions   0 recs

What would account for

such a large difference in R2 values from year to year? I’m not a huge proponent of UZR, but I would think that the data would correlate similarly from year to year, even if it’s really noisy.

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 21, 2009 8:33 PM EDT up reply actions   0 recs

Could be just small sample size

When I have time later tonight, I’ll run a regression with all of the UZR data (since 2002). Should only take me a couple of minutes.

Derosa.

by vivaelpujols on Jul 21, 2009 9:23 PM EDT up reply actions   0 recs

the Moore data...

… reports an “r” rather than “r^2”. is that just for simplicity’s sake? because if you square .7 you get .49 which is at least in the same ballpark at .37. that difference could easily come from different sample sizes.

i have no idea why they’d report an “r” that isn’t squared, but maybe he did?

by kindred on Jul 21, 2009 10:49 PM EDT up reply actions   0 recs

Yeah, I was wondering about that too

Excel only offers R^2 on the trendline options, but he might not have been using Excel.

Derosa.

by vivaelpujols on Jul 22, 2009 12:05 AM EDT up reply actions   0 recs

I was squaring it Moore's R

I agree it’s in the same ballpark, but I would think they would be closer together for a sample that large — maybe if we had all of 2009 they would be. It would be interesting to see 2007 and 2008 side by side for this reason.

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 22, 2009 9:56 AM EDT up reply actions   0 recs

that R2 is really low...

… last year’s is somewhat better but still not great. might someone with the data handy run a regression with the “luck” factors included on the right-hand side and see what happens to the model R2. or even better, report the p-values?

by kindred on Jul 21, 2009 7:19 PM EDT reply actions   0 recs

I think there's one major problem here

you’ve looked at ranks instead of actual values. If one team’s ERA and FIP differ, say, 10x more than anyone else’s (hypothetical) the “E-F rank” value only puts them one point ahead of the second team, so it brings in a lot of potential bias.

You should list the actual team ERA, actual team FIP, the difference between the two, and the number of runs that works out as (for instance, a team’s played 90 innings with a 3 ERA and a 4 FIP, they’ve actually conceded 30 runs when FIP predicts they should’ve conceded 40). You can then see if their total team UZR makes up the difference (i.e. take the team’s cumulative UZR so far, not the UZR/150). You could then list the deficit or positive difference between ERA and FIP in runs, and the number of runs saved or booted by total UZR. You’d hope the values would somewhat correlate.

But yeah, I think this is a really useful exercise.

Felonius Monk - bitching to contact since 2008

by Felonius_Monk on Jul 22, 2009 5:25 AM EDT reply actions   0 recs

Ah I see someone's basically already done that....

frankly, the only conclusion I can draw from that R^2 value is that either FIP or UZR isn’t hugely accurate. As there’s a fair bit of data in support of FIP, I think I’m going to have to view UZR as pretty vague from now on (although I guess it could be cumulative error from both metrics)….

Just looking at the graphs visually there’s a pretty huge difference, we’re talking multiple wins for a lot of the data points. Although I suppose you could argue that (per player) that’s not really going to be a huge amount of runs.

Felonius Monk - bitching to contact since 2008

by Felonius_Monk on Jul 22, 2009 5:33 AM EDT up reply actions   0 recs

Perhaps we could take components of UZR

such as the % of GB turned into outs and the % of FB turned into outs and compare that with GB% and FB% for a pitching staff. It would take out the HR noise, since UZR doesn’t factor that but FIP does.

"I just wish that the late Harry Caray were still around so I could hear him mispronounce 'Kosuke Fukudome' every fukun' night" -- Dennis Miller

by fourstick on Jul 22, 2009 10:17 AM EDT up reply actions   0 recs

maybe

the problem isn’t FIP or UZR, but ERA. We all know that the official scoring of errors is crap.

by cdb on Jul 22, 2009 12:08 PM EDT reply actions   0 recs

sure...

… but as long as the scoring “mistakes” are somewhat random then they shouldn’t bias the result. the only way it’s a problem is if some teams were getting consistently different rulings than other teams, and i don’t know of any evidence that this actually occurs.

by kindred on Jul 22, 2009 5:25 PM EDT up reply actions   0 recs

as the scoring is done by the home scorer

I suspect there might be. There’s evidence that there is a significant difference in the way batted balls are recorded as LD/FB/GB in different stadiums, so I find it hard to believe there maybe isn’t the same difference in the stringency of the “error” ruling. I guess if you play half your games at home with a scorer who is very lenient on errors, you’ll end up with pitchers with a lower ERA than they would in a context-neutral environment…

Felonius Monk - bitching to contact since 2008

by Felonius_Monk on Jul 23, 2009 6:54 AM EDT up reply actions   0 recs

would it be better then to run against RA rather than ERA? i think i've seen some saber types

do that.

the truth can't hurt you, it's just like the dark/ it scares you witless, but in time you see things clear and stark -- macmanus

by tom s. on Jul 23, 2009 1:40 PM EDT up reply actions   0 recs

Comments For This Post Are Closed


User Tools

Welcome to the Internet's #1 St. Louis Cardinals blog.
Start posting about the Cardinals »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

Cbs_fantasy_baseball_promo

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Avatar_small
Do the Cardinals need to add another Bullpen arm?
Ozzie_small
VEB Needs a Tagline
Photo_29_small
Rich Hill still has two arms for some reason.
Cardinals_spring_baseball_small
2010: A Baseball Odyssey

Recent FanPosts

Mizzou_small
VEB Bracket Contest (EDIT: Created)
Veb-adam-yadi-boog_small
Strasburg v. Wainwright - March 14, 2010
St
1985 Don Denkinger bad call photo
74591_missouri_state_small
(Another) Fantasy League
Images_small
Wednesday Morning Fun Fact
Cardinal70-48_small
2010 Cardinal Approval Ratings

+ New FanPost All FanPosts >

SBNation.com Recent Stories

From foreground left, San Francisco Giants pitchers Joe Martinez, Matt Cain and Brian Wilson run in the outfield during baseball spring training at Scottsdale Stadium in Scottsdale, Ariz., Thursday, Feb. 18, 2010. (AP Photo/Eric Risberg)

SB Nation's 2010 MLB Previews: San Francisco Giants, No Thunder In The Lumber

Milwaukee Brewers' Casey McGehee signs autographs before a spring training baseball game against the Cleveland Indians on Monday, March 15, 2010, in Phoenix. (AP Photo/Morry Gash) +6 updates

Spring Training News & Notes 3/16: Catching Up With Everyone

New York Yankees' Robinson Cano follows through on his 200th career hit during a baseball game against the Boston Red Sox at Yankee Stadium, Sunday, Sept. 27, 2009, in New York. (AP Photo/Kathy Willens) link

Is Robinson Cano A Good Choice To Hit Behind A-Rod?

More from SBNation.com >


Managers

Jack_benny_small DanUpBaby

Editors

Images_small azruavatar

Trigun_001_small the red baron

Adam1_small chuckb