Usain Bolt: It’s Just Not Normal

Usain Bolt‘s wonderful run in the Olympic 200-meter sprint reminds us that the normal distribution — the familiar bell curve beloved by economists and statisticians — can be wildly inappropriate when analyzing extremely selected samples.

This morning’s New York Times shows Usain Bolt’s new world record, relative to the 250 greatest 200-meter sprints ever. Not only does this not look like a normal distribution, it doesn’t even look like the tail of any standard distribution I’ve ever seen:

INSERT DESCRIPTION

The full graphic, as a story board, is available here. (It is a beautiful example of using statistics to tell a story.) It should be clear from this chart why few thought that the previous world record would be broken anytime soon. (An interesting aside: This graphic shows that it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint.)

Extreme outliers aren’t that unusual in sports. The greatest outlier may well be Australian cricketer Donald Bradman, whose career batting average of 99.94 puts him so far ahead of any other cricketer that it defies comprehension. (Trivia note: Bradman played the piano at my grandmother’s wedding.) Here is a histogram of career batting averages conditional on being among the top 100 (among those with at least 20 innings):

INSERT DESCRIPTION

Some argue that Joe DiMaggio‘s 56-game hitting streak is pretty extraordinary. So I put together a histogram of the great hitting streaks (among those longer than 30). DiMaggio is okay, but he’s no Don Bradman.

INSERT DESCRIPTION

The key to all of these strange distributions is that we are focusing on the extreme tails of highly selected samples, where the usual statistical patterns rarely hold. These situations are highly atypical, but equally, incredibly interesting when thinking about the very greatest. (I’ve never understood the urge to call these “black swans,” given that black swans are actually fairly common birds if you know where to look.)

Those interested in how things change in extremely selected samples may enjoy Tim Groseclose‘s paper, “Extreme Sample Selection Bias: Conditions That Cause the Correlation Between Two Variables to Switch Signs.” Groseclose claims that this extreme sample selection can explain why nonmillionaire members of Congress win re-election more often than millionaires; why it shouldn’t be surprising that the greatest golfer is multiracial, even though most top golfers are white; and why high S.A.T.’s may actually predict lower subsequent incomes among those attending elite universities.

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 79

  1. Bryce says:

    Two things about the sprinting…

    It was said on the olympics broadcast during the 200m final that Bolt’s best event may be the 400m, even though he does not compete in it. This leads me to the fact that Michael Johnson was a 400m runner. In sprinting, the trick is not who gets to the highest speed, but WHO SUSTAINS THEIR TOP SPEED THE LONGEST, so a great 400m runner, should be a lot better at the 200m than a great 100m runner, because the 400m runner trains for a longer race, and should be stronger and more able to sustain their top speed for longer. Unfortunately because of the physical exertion in both the 200m and 400m, not many 400m runners run the 200m (also they are usually competed in with less time between at meets), which is why Michael Johnson’s 400m & 200m golds were so amazing and had never been done before. This is also why it would probably be hard to compare 200m times of 400m runners vs 200m times of 100m runners.

    Which brings me to my second point. The acceleration phase of a 100m sprint is about 40m. Then there is top speed and the rest is inevitably deceleration. The same goes for the 200m. Sprinters will accelerate for about 40m in the 200m and then, the stonger sprinters will hold their top speed longer and decelerate at a slower rate. The acceleration zone is the same length, but it is a smaller percentage of 200m than 100m, so the overall average speed is less affected by it in the 200m. Also, the second half of the 200m is always much faster than the fastest 100m sprint because in the 2nd half of the 200m doesn’t have any acceleration zone… it get’s a running start.

    Thumb up 1 Thumb down 0

  2. Eric says:

    I think this has more to do with pure chance that any kind of quantifiable statistic. 0.3 seconds is such a short period of time that many factors could contribute to it. This is exactly the reason more than a 5 mph tailwind disqualifies a run from the record books. All of the runners in that one race benefit, but it would be unfair to compare them with runners going into a 5 mph headwind. This one time, everything that could go one way or the other for Bolt happened to go the right way. To get to this point (qualifying for the Olympics), all of the training and work made the difference, but running with 8 people all within a half of a second of each other, it is all the little things that make the difference. My own experience as a high school sprinter tells me that I had about a one-half second range of time for the 100 and about a second and a half for the 200. I am sure that range is smaller for an elite sprinter who controls every body movement through years of training, but it still exists.

    Thumb up 2 Thumb down 0

  3. Raj Pandravada says:

    Comparing Bradman’s accomplishments to ANYBODY else’s, be it Bolt’s or DiMaggio’s is simply unfair…to the Don, that is.

    As the most prolific scorer of his time and indeed future generations, he was the prime target of every opponent’s attack. The famous ‘bodyline’ series, where the visiting English side, led by Douglas Jardine, directed their fast bowler, Harold Larwood, to simply aim at Bradman’s body and un-helmeted head to thwart his run-making ability, is the starkest example of the disruptive force he was. Cricket’s rules have now changed to prevent such attacks, but alas…very few people could or will be able to emulate the great Don.

    Thumb up 2 Thumb down 0

  4. Imad Qureshi says:

    I am very happy to read about cricket here and of course Don Bradman. But there was only Test cricket(5 day matches) during that time. Today cricket is very fast and comparing one day averages with test is not appropriate. That being said, I agree that the cricket world is yet to see another Don Bradman.

    Thumb up 1 Thumb down 0

  5. Jason says:

    Several people posted why the 200m is faster than the 100m, which seemed obvious to me. Why is that a recent phenomenon? Did people used to run out of energy faster – more 100m types than 400m types running? Since the top speed was lower, did the acceleration portion take less time as a percentage?

    Thumb up 0 Thumb down 0

  6. Jonathan says:

    to Tomtom @ #3,

    you are kidding yourself if you think the sprinters from the 80s weren’t juicing.
    I spoke with an excellent authority, a physiologist who studied the effects and use patterns of steroids]. He’d interviewed the major supplier of steroids, etc. in the U.S> and world, who told him that 6 of the eight finalists in 1992 were using drugs that had passed through his hands.
    Johnson only got caught because he was dumb, and went off his cycle.

    I’ve been an international athlete myself, albeit in a sport with absolutely no $$ at stake, just personal satisfaction (so little or no drug use by US athletes) and I have to say, sadly, that it is a near certainty that any athlete even moderately successful at the elite level, in a sport where there is money to be made, is using illegal performance enhancing drugs.

    Thumb up 0 Thumb down 0

  7. Paul says:

    Aren’t the results of this paper a natural followup from Fryer’s paper
    http://www.economics.harvard.edu/faculty/fryer/files/fryer_dtheory_secondrevision.pdf
    One example is that if you simply look at top golfers in the world, then you would probably suspect Tiger Woods to be even superior since he had to initially overcome the discrimination of being black in order to be there in the first place…

    Thumb up 1 Thumb down 0

  8. Dan says:

    Apart from the performances, I am commenting on the distribution curve. You mention the distribution doesn’t look like any other kind you’ve seen before. Actually, all it is is the very extreme end of a normal distribution curve. It isn’t at all surprising that it would look the way it does.

    Thumb up 0 Thumb down 0