Usain Bolt‘s wonderful run in the Olympic 200-meter sprint reminds us that the normal distribution — the familiar bell curve beloved by economists and statisticians — can be wildly inappropriate when analyzing extremely selected samples.
This morning’s New York Times shows Usain Bolt’s new world record, relative to the 250 greatest 200-meter sprints ever. Not only does this not look like a normal distribution, it doesn’t even look like the tail of any standard distribution I’ve ever seen:

The full graphic, as a story board, is available here. (It is a beautiful example of using statistics to tell a story.) It should be clear from this chart why few thought that the previous world record would be broken anytime soon. (An interesting aside: This graphic shows that it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint.)
Extreme outliers aren’t that unusual in sports. The greatest outlier may well be Australian cricketer Donald Bradman, whose career batting average of 99.94 puts him so far ahead of any other cricketer that it defies comprehension. (Trivia note: Bradman played the piano at my grandmother’s wedding.) Here is a histogram of career batting averages conditional on being among the top 100 (among those with at least 20 innings):

Some argue that Joe DiMaggio‘s 56-game hitting streak is pretty extraordinary. So I put together a histogram of the great hitting streaks (among those longer than 30). DiMaggio is okay, but he’s no Don Bradman.

The key to all of these strange distributions is that we are focusing on the extreme tails of highly selected samples, where the usual statistical patterns rarely hold. These situations are highly atypical, but equally, incredibly interesting when thinking about the very greatest. (I’ve never understood the urge to call these “black swans,” given that black swans are actually fairly common birds if you know where to look.)
Those interested in how things change in extremely selected samples may enjoy Tim Groseclose‘s paper, “Extreme Sample Selection Bias: Conditions That Cause the Correlation Between Two Variables to Switch Signs.” Groseclose claims that this extreme sample selection can explain why nonmillionaire members of Congress win re-election more often than millionaires; why it shouldn’t be surprising that the greatest golfer is multiracial, even though most top golfers are white; and why high S.A.T.’s may actually predict lower subsequent incomes among those attending elite universities.

I’m not buying the comparison of the the sprint graph outliers to the graphs of the batting outliers. The batting outliers are more of a series of the unlikely events and are influenced by many factors. A comparison to home run distance would be more appropriate and shows a direct correlation to improvements in training, human size and possibly steriods.
On a tangent, if the 200m now has the fastest average speed, why isn’t the winner of this event now called “World’s Fastest Man/Woman”?
The question this raises is whether Bolt and Michael Johnson are using the same supplier for the medications, and Carl Lewis, Ben Johnson and the rest who “dominated” this noble sport in the last 30 or so years were at a physiological disadvantage, or merely had worse access to meds.
oh, and I don’t believe it is appropriate to compare Sir Don’s remarkable achievements with the Usain Bolt or Michael Johnson.
What I meant to say was that the article says “it is only a fairly recent phenomenon that the 200-meter typically yields a faster average speed than the 100-meter sprint.”
It is not surprising that the sportscasters haven’t kept up with the statistical analysis.
In any case, Bolt is the Olympic Champion and WR holder in the 100 and 200 now. He’s pretty much the undisputed fastest man alive.
So, now that Bolt has broken Johnson’s record, how long before someone breaks Bolt’s? Johnson’s previous record is now such an outlier. In fact, times at that end of the spectrum have now doubled. Only a few more and the histogram begins to look normal. Plus, what would happen if you just looked at the top times of the last year?
http://blackpoliticalanalysis.com
I’d imagine that the fastest average speed has a lot to do with the fact that it takes a good percentage of the race in a 100m to actually get “up to speed” where as you maintain a higher speed for longer in the longer race.
I’d think that the race that advantages those with good acceleration and starting reaction time as well as good overall speed is the appropriate race for the “world’s fastest.”
Now if you took a sample from Bolt’s real hometown on Krypton the distribution might me more normal.