I would like to begin with a few brief disclaimers. Like many people on this blog, I am not comfortable with the degree to which PED users are despised in many baseball circles. I would not go so far as to say that I wish PEDs were legal in the sport, but I also don’t believe that PEDs are ruining the game, and I don’t think that all players who are accused of doping should be shunned for life. This Fanpost isn’t meant to rip apart Braun. I actually kind of like Ryan Braun. He’s an incredibly good player, he took a hometown discount to stay with the Brewers, and he seems like a generally decent guy – aside from his inexplicable love of douchey T-shirts. That said, I thought it might be interesting to take a look at the probability that Braun is guilty of the doping allegations that have been hurled his way, which will no doubt continue to follow him for the rest of his career. This is meant more as a fun exercise than a comprehensive attempt to either accuse or exonerate Braun. Another quick disclaimer, I am not a professional statistician. I have an undergraduate degree in Math and I use a lot of math every day for work, but I’ve never taken a formal class on Bayesian statistics. I would not be offended if anyone points out any errors I make in the comments. Lastly, this ended up being longer than I had intended. I apologize for that.
On Tuesday evening Jeff Passan, columnist of Yahoo Sports, published a story revealing documents containing Ryan Braun’s name that were obtained from Biogenesis, the Miami-based anti-aging clinic that has been accused of providing performance enhancing drugs to Alex Rodriguez, Gio Gonzalez and other MLB players. This evidence initially appeared to be quite damning for Braun, considering his failed drug test from the 2011 playoffs. Braun quickly released the following statement:
"During the course of preparing for my successful appeal last year, my attorneys, who were previously familiar with Anthony Bosch, used him as a consultant. More specifically, he answered questions about T/E ratio and possibilities of tampering with samples.
"There was a dispute over compensation for Bosch’s work, which is why my lawyer and I are listed under ‘moneys owed’ and not on any other list.
"I have nothing to hide and have never had any other relationship with Bosch.
"I will fully cooperate with any inquiry into this matter."
Today the internet has seen a large amount of debate about the credibility of Braun’s excuse. At Fangraphs, Wendy Thurm has published an article defending the plausibility of the story. She draws on 20 years of legal experience to argue that it is entirely possible that Braun’s attorneys could have used Anthony Bosch as an expert consultant in preparation for the appeal. Personally I find Thurm’s argument convincing. It does strike me as possible that Braun’s connection with Bosch and Biogenesis could be entirely legitimate. That said, I think that the news certainly doesn’t do anything to help his innocence. I think we can all agree that the probability that Braun used PEDs is higher this morning than it was Monday morning. I thought it would be interesting to try to get an idea of how much higher that probability is.
Before we get into the analysis I’d like to give a very brief primer on Bayesian statistics. If you’d like to understand more I recommend the Wikipedia page on Bayes’ theorem. If you’re even more interested I can recommend the book The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t by Nate Silver. As many of you probably know, Nate Silver originally rose to prominence by developing the PECOTA system for baseball projections, and has since gone on to national fame through political predictions made at his FiveThirtyEight blog for the New York Times. Briefly, Bayesian statistics offers a method by which to revise our beliefs based upon the observation of a certain condition. The Wikipedia page offers a brief example. Suppose you’re sitting on a train and you see a person in front of you with long hair. You wish to determine if the person is male or female. Before observing the persons hair, the probability that the person is female should be 50%, because there should be just as many women as men on the train on average. After observing the person’s long hair, Bayes’ theorem gives you a method of revising your probability based on two other factors: the probability that the person would have long hair given that the person is a woman, and the probability that the person would have long hair given that the person is a man. I’ll refer you to the article if you’d like to know the outcome of that particular scenario. In the interest of conserving space we will move on to applying this statistical theorem to the case of Ryan Braun.
The way I see it, the saga of Ryan Braun has seen three fundamental events take place between November of 2011 and the present day: Braun’s failed test, Braun’s successful appeal and Braun’s name appearing in the papers at Biogenesis. We will handle these three conditions successively and try to determine the probability of Ryan Braun’s guilt after each. For each condition, we will need to determine the probability of that condition taking place given that Ryan Braun was doping and the probability of that condition taking place given that Ryan Braun was clean. We will also need to use the probability that any given player in the MLB is doping. Please note that some subjectivity is inherent in the process, as some of the numbers involved are not readily available. At the end I will make my spreadsheet available so that anyone who wishes may input their own numbers to see how the overall probability changes accordingly.
Before we begin treating the conditions we wish to determine what probability we would have assigned to Braun was doping before he failed the test. In other words, what is the probability that a randomly selected MLB player is on steroids? This number is of course difficult to ascertain. In 2003 David Wells famously asserted that "25 to 40 percent of all major leaguers are juiced", but times have changed since 2003 and there is reason to believe the current number is probably much lower than that. In 2012, seven major league players were suspended for PEDs. If we simplistically say that there are 25 players on each team (of course the real number of players to play in the majors each year is considerably higher than this) then there are a total of 750 active MLB players at any given time. Seven players is therefore just under one percent. However I think we can all agree that there are likely a decent number of players who were doping but were not caught. For this reason I chose a probability of 5%. This is of course a subjective choice, and you are free to experiment with different ones in the spreadsheet linked below.
Condition one: Ryan Braun tests positive for exogeneous testosterone
The source of this section is an article written by Will Carroll for Sports Illustrated on December 11, 2011. During the 2011 playoffs Ryan Braun was subject to a urine test. The urine was shipped to the lab and split into two samples, A and B. The A sample is subjected to a number of tests including the ratio of testosterone to epitestosterone (T/E ratio). Generally the ratio of these two chemicals is 1:1, but it is known to fluctuate. Any result higher than 4:1 triggers further tests. Braun’s urine showed a T/E ratio higher than 4:1 and thus further tests were done on the B sample. The methodology of the tests is not specified by MLB, though carbon isotope ratios and chromatography are considered state of the art. Using these tests the lab determined that Ryan Braun’s urine contained exogeneous testosterone and thus the test was ruled positive.
We are tasked with determining the probability that Braun’s urine would have tested positive if we were certain that he was using synthetic testosterone supplements and also the probability that his urine would have tested positive if we were certain that he was clean. At this point, we are not to consider the breach in the chain of custody, because that information was not yet public. Again these numbers are difficult to ascertain exactly. Before I estimated that 5% of major league players are doping despite only 1% testing positive. For consistency I will claim that there is a 20% chance of a player testing positive given that he is in fact doping. As I previously stated, MLB is does not disclose the methodology that leads to a positive test, so it is impossible to determine the exact probability of a false positive. However, the MLB undertakes over a thousand tests every year (at least two for each of the 750 players) and has never (as far as I know) publicly accused a player and later revealed the test to be a false positive. Their methods are very sound. I assigned a false positive value of 0.05%.
Inputting these two probabilities along with the initial probability (5%) I found that there was a 91.3% chance that Braun was doping immediately following the positive test.
Condition two: Ryan Braun’s appeal is successful
The source for this section is an article written by Tom Haudricourt for the Journal Sentinel Ryan Braun appealed his 50-day suspension. His attorneys argued that the test was not valid because the chain of custody had been broken when the collector left the urine samples in his house over the weekend rather than immediately shipping them. MLB’s arbitrator, Shyam Das, sided with Braun, stating that there were significant questions about the chain of custody that led Braun’s urine to the MLB testing lab. It was the first successfully appealed suspension in MLB history, and a public statement from MLB left little doubt that the league was extremely disappointed in the decision.
We need to determine the probability that Braun would have won his appeal given that he was doping, and the probability that he would have won his appeal given that he was clean. The probability that he win his appeal given that he was doping is difficult to determine. In a tweet, the very same Jeff Passan who would later publish the link between Braun and Biogenesis passed along a statement from the CEO of the US Anti-Doping Agency, Travis Tygart, who said "it's commonplace for collectors to keep samples refrigerated when taken late at night and on holidays (and) weekends". Perhaps Braun just got lucky that his urine was collected on a Friday. Therefore we’ll assign a 14% chance that the chain of custody would have been broken given that he was doping (i.e. the probability that the test was taken on a Friday). We should assume that the probability of the chain of custody being broken given that he was clean must be very high. After all, Braun’s entire argument hinged on the fact that the breach in chain of custody gave someone a chance to taint his urine sample with exogeneous testosterone. If we are certain that he was clean, then something must have happened to account for the testosterone that appeared in the sample. Therefore I assign a probability of 75% that the chain of custody would have been broken (and hence Braun would have won his appeal) given that he was clean.
Plugging these values into Bayes’ theorem I found an updated probability of 66.3% that Braun was guilty.
Condition three: Braun is linked to Biogenesis
At the beginning of this post I outlined the events of the last 24 hours that have linked Ryan Braun to Biogenesis, the anti-aging clinic that provided PEDs to a number of MLB players. Even if we assume Braun to be using PEDs, the likelihood of his name appearing on the ledger at Biogenesis is probably low considering there is doubtlessly more than one doctor in the country willing to supply PEDs to players. However, Braun claims that his lawyers were in contact with Bosch for consulting, and we’ve already admitted that that is at least a plausible scenario. Therefore let us state that there was a 70% chance of Braun’s name appearing on Biogenesis documents given that he was doping and a 50% chance of his name appearing given that he was clean (i.e. there is a 50% chance that both his story was true and his name was written on their documents).
Inputting these final probabilities into our formula we find a final probability, as of Wednesday, February 06, 2013, of 73.3% that Ryan Braun had used PEDs. The recent revelations regarding his connections to Anthony Bosch should add to our belief that Braun is guilty of doping, but not significantly. Based on my analysis, his odds were about 2/3 before the revelation, and now they’re closer to 3/4.
In the end, based on my (very) subjective analysis, it looks like there’s roughly a 75% chance that Braun is guilty of using PEDs. I welcome criticism into my methods. Please try to remember that this was all meant to be in good fun. Just something that I came up with to help pass a boring Wednesday.
Lastly, I submit the link to the spreadsheet that I used to calculate the probabilities. The cells that you should edit are in white while those that should remain untouched are in gray. I hope the rest of the spreadsheet is self-explanatory.