Hi everybody!! It seems that we have been discussing Pitch f/x a lot in the main threads recently, and I thought it might be nice if we could take it all to one place.
First, I'd like to give some background info on how the system works, how it is accessible to the public, and it's practical application to baseball. Hopefully, this will clear up any misconceptions certain people might have...
Implement by Sport Vision, PItch f/x cameras are able to capture certain attributes of a pitch, including: velocity, movement, location, and much more. Pitch f/x data has been recorded on every single pitch in the majors since opening day 2008, and in a lot of games in 07 as well.
Where to find it
PItch f/x data is made publicly available via MLB.com's Gameday service. While Gameday is only meant for entertaintment use, and doesn't lend itself for serious analysis, MLB archives all of the files in XML form, like so
These are all of the pitchers on the Royals who pitched today. If you click on them, you'll notice that it just looks like a bunch of weird shit; however, you can export it to excel by right clicking on the xml file and downloading it to your computer. Then it organizes itself, and becomes managable through excel.
However, that only allows you to take a look at one game by one pitcher at a time. If you want to aggregate every single pitch in the majors to do a more detailed study, or compare pitchers start to start, or even look at hitters, you'll have to parse all of the Gameday files to an SQL database. I recently "wrote" a primer on how to do so, which you can read here:
It should be relatively easy to follow, and you should *definitely* look in the comment section for more info. Be warned, it's a daunting task and may take up to a week to do, but it's definitely worthwhile.
Whether you are looking at one game or all of the pitchers, you are provided with a boatload of data on each pitch. Mike Fast put together an excellent description of each field on his blog:
Read that and bookmark it.
Analyzing the data
There is a ton of simple, yet revealing, things you can look at, just based on one game using Pitch f/x. For example, here are all of the pitches thrown by Wainwright in his start against the Giants, when he struck out 12 hitters in 9 innings:
Obviously, his stuff was really good that night. So how do we quantify that with Pitch f/x? Well, we can take a look at the two main attributes of a pitchers stuff; velocity and movement. Velocity is denoted by the heading called "start_speed" (they also track the end speed of each pitch, but that really isn't important as far as I know). Basic movement is denoted by the "pfx_x" and "pfx_z". The first one is the vertical movement of the pitch, and the second is the horizontal movement.
Given those two categories, you can reasonably show how good a pitchers' stuff was in a given night; however, first you have to seperate the pitches by pitch type. Pitch f/x data comes with a pitch type algorythm, but it's often wrong. Fortunately, it seems to classify Waino pretty well, because he has 4 distinct pitches. A fastball (don't worry about breaking it up by 2 Seam and 4 Seam yet), slider, curve and change.
The pitches are marked under the heading pitch_type. FF is 4 seam fastball, FT is 2 seam fastbal (again, just combine the two for now), SL is slider, CU is curve, CH is changeup and KN is knuckleball.
So sort the data by pitch type, and figure out the average start_speed, pfx_x and pfx_z of each of his pitches. Or, if you want to take the lazy way out, and make a pretty graph at the same time, you can graph out the movement like so:
That graph may seem a little obscure, but it is very informative. You can see the average velocity on his pitches, and the range of break on each of them. As you can see, the changeup and fastball have similar movement, with the changeup having a bit more drop to it. The slider moves about 10 inches to the right (from the catchers point of view) in comparison to those two pitches, and the curveball is way to the right and drops about 10 inches (that's one of the biggest breaking curves in the majors obviously). That one "fastball" that has the movement of a slider, is probably a slider.
Of course, this is pretty worthless on it's own. Let's take a look at how his stuff looked against the Braves on April 29th. That start he gave up 3 runs, and walked 5 hitters while only striking out two. Here is the data for that start:
And here is how his stuff looked:
You can see some subtle, yet important differences. His fastball velocity was over 1 MPH slower, and his slider velocity was faster, meaning the speed differential was worse. The break on his fastball, changeup and slider was also moved slightly over to the right, while the curveball break held constant, meaning he was getting less seperation on those pitches.
In any given start, there are really 3 things that a pitcher has control over: stuff, location and sequecning. We already took a look at Waino's stuff, now let's look at his location.
To plot location on a graph, you select px as the x axis and pz as the y axis. I like to break it up by pitch type, or pitch outcome (swinging strike, ball, hit, called strike, etc.). Let's take a look at Waino's night against the Giants by pitch type, with swinging strikes circled:
As you can see, he was downright unhittable that night, totaling 19! swinging strikes. His curveball was especially good, as he generated 10 swinging strikes on 36 curves. He was able to pound the 1st Base side of the zone with his curveball and slider, while keeping his fastball always around the strike zone.
These are just one thing that you can do with Pitch f/x. Other simple things you can look at are:
- Velocity and movment by inning
- Location against righties and lefties
- Pitch selection
- Where a player gets his swinging strikes
- Spin of each pitch
- Release point
And if you want to take a look at some more complex and actionable things:
- How a pitcher pitches on the stretch compared to with the bases empty
- How effective offspeed pitchers are following a fastball in comparison to following an offspeed pitch
- How location can affect things like GB% and HR/FB ratio
- How umpires affect individual pitchers and hitters
So play around with the two spreadsheets I gave you, or download your own data. We are only starting to scratch the surface of what we can do with all this data, and I can guarantee you that it will end up helping major league clubs if it hasn't already.
You can use this thread to ask questions about Pitch f/x, or have criticisms or propose new ideas for studies. Or anything else than you can think of.
Here are some links for further reading on the subject:
Anything else written by Josh Kalk, Mike Fast, Jon Hale, Harry Pavlidis, Dave Allen, Jeff Zimmerman or Alan Nathan.
And some of my own work (/shameless self promotion):