Here is something that I don’t quite know how to interpret.
In the Duke lacrosse sexual assault case, the police made the 46 players come down to the police station to have their pictures taken. Then these 46 pictures were shown to the woman who has accused the lacrosse players of sexually assaulting her.
She was shown the pictures one-by-one. The three players that she positively identified were the fourth, fifth, and seventh pictures that she saw. These are the only three positive identifications that were made.
Statistically this is quite strange. The chance of any one player being positively identified is 3/46, or about .065. I did the calculations, and if the order of the pictures was randomly chosen, the probability that 3 of the first 7 pictures would be positive identifications is less than 1 in 100.
This suggests one of three possibilities:
1) Rare events happen and maybe this is the one time in 100 that a distribution this unusual occurred.
2) The police intentionally or unintentionally stacked the deck so that these three pictures were in the beginning.
3) There was some sort of bias that led the accuser to be inclined to give positive identifications early in the process.
I’m not trying to side with either party on this matter, at all. Indeed, I haven’t even been paying close attention to what has been happening. I just raise this as a statistical curiosity for the conspiracy theorists among you to argue about.
(Thanks to Brian Sullivan for bringing this issue to my attention and passing along the Motion to Dismiss which contains this information. I don’t have an online version of it to link to. I’m sure someone reading this blog will be able to provide such a link.)
[Addendum on July 4: I may have misread the legal documents. Some blog readers have argued that the 40th picture yielded one of the suspects, and that one of the early pictures in which the accuser gave a 90% positive ID was not used. The documents I looked at had the 40th picture redacted.]

Why couldn’t it suggest a 4th possibility, that the accuser is lying?
Why couldn’t it suggest a 4th possibility, that the accuser is lying?
I vote #3. Did the accuser know there were 46 photos coming? If she didn’t, then she would feel pressure to identify 3 photos early in the string before the photos ran out. Most people only think of 8 or so men in a line-up from the movies, so would probably also assume no more than 10 photos in a row.
I vote #3. Did the accuser know there were 46 photos coming? If she didn’t, then she would feel pressure to identify 3 photos early in the string before the photos ran out. Most people only think of 8 or so men in a line-up from the movies, so would probably also assume no more than 10 photos in a row.
Actually, the amount of data in this problem is not sufficient to prove anything about the arrangement of the pictures, and there are no statistical anamolies.
The probablility of the three pictures showing up in spots 4,5 and 7 is the same as the probability of them showing up in 12, 24, and 36. The probability of the three pictures being in the first seven is the same as them being in the range [n,(n+7)] and thats the same as them being found in any set of seven numbers between 1-46 (for example, (4,5,7,23,31,35,44)).
Continuing this example to an analogy, would it still seem strange if I told you that the probability of the 3 photos happened to be in the set mentioned above was 1/100?
What would make an interesting study is to take a look at data from many similar criminal ID processes and compute in what way the distribution of positive ID’s deviates from a random distribution. Then, you could speculate as to why this happens.
Actually, the amount of data in this problem is not sufficient to prove anything about the arrangement of the pictures, and there are no statistical anamolies.
The probablility of the three pictures showing up in spots 4,5 and 7 is the same as the probability of them showing up in 12, 24, and 36. The probability of the three pictures being in the first seven is the same as them being in the range [n,(n+7)] and thats the same as them being found in any set of seven numbers between 1-46 (for example, (4,5,7,23,31,35,44)).
Continuing this example to an analogy, would it still seem strange if I told you that the probability of the 3 photos happened to be in the set mentioned above was 1/100?
What would make an interesting study is to take a look at data from many similar criminal ID processes and compute in what way the distribution of positive ID’s deviates from a random distribution. Then, you could speculate as to why this happens.
Additionally, this is a staggeringly flawed method for giving a picture lineup. It makes the assumption that an assault must have occurred. It would have been more accurate to show the accuser 100 photos with the 46 players interspersed in there. The inclusion of lacrosse players and nonlacrosse players keeps the playing field level. A decent lawyer should be able to pick this apart.
Additionally, this is a staggeringly flawed method for giving a picture lineup. It makes the assumption that an assault must have occurred. It would have been more accurate to show the accuser 100 photos with the 46 players interspersed in there. The inclusion of lacrosse players and nonlacrosse players keeps the playing field level. A decent lawyer should be able to pick this apart.