Are the F.B.I.’s Probabilities About DNA Matches Crazy?

Jason Felch and Maura Dolan of the Los Angeles Times recently wrote a fascinating piece about a controversy that has arisen regarding the use of DNA in identifying criminal suspects. The article starts like this:

State crime lab analyst Kathryn Troyer was running tests on Arizona’s DNA database when she stumbled across two felons with remarkably similar genetic profiles.

The men matched at 9 of the 13 locations on chromosomes, or loci, commonly used to distinguish people.

The [Federal Bureau of Investigation] estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. But the mug shots of the two felons suggested that they were not related: One was black, the other white.

In the years after her 2001 discovery, Troyer found dozens of similar matches — each seeming to defy impossible odds.

As word spread, these findings by a little-known lab worker raised questions about the accuracy of the F.B.I.’s DNA statistics and ignited a legal fight over whether the nation’s genetic databases ought to be opened to wider scrutiny.

Later, a systematic search of the 65,000 felons in the Arizona database revealed that there were 122 pairs that matched at 9 of 13 loci. Twenty pairs matched at 10 loci.

When I heard about this, I wondered if the F.B.I. is totally off its rocker when it comes to the probabilities it gives about DNA matches. Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?

Perhaps surprisingly, the answer turns out to be yes. Let’s say that the chance of any two individuals matching at any one locus is 7.5 percent. In reality, the frequency of a match varies from locus to locus, but I think 7.5 percent is pretty reasonable. For instance, with a 7.5 percent chance of matching at each locus, the chance that any 2 random people would match at all 13 loci is about 1 in 400 trillion. If you choose exactly 9 loci for 2 random people, the chance that they will match all 9 is 1 in 13 billion. Those are the sorts of numbers the F.B.I. tosses around, I think.

So under these same assumptions, how many pairs would we expect to find matching on at least 9 of 13 loci in the Arizona database? Remarkably, about 100. If you start with 65,000 people and do a pairwise match of all of them, you are actually making over 2 billion separate comparisons (65,000 * 64,999/2). And if you aren’t just looking for a match on 9 specific loci, but rather on any 9 of 13 loci, then for each of those pairs of people there are over 700 different combinations that are being searched.

So all told, you end up doing about 1.4 trillion searches! If 1 in 13 billion searches yields a positive match as noted above, this leads to roughly 100 expected matches on 9 of 13 loci in a database the size of Arizona’s. (The way I did the calculations, I am allowing for 2 individuals to match on different sets of loci; so to get 100 different pairs of people who match, I need a match rate of slightly higher than 7.5 percent per locus.)

What I find interesting about this article and these calculations is that they show how the same sets of basic statistical relationships can appear much more or less convincing depending on how they are portrayed. When we hear that there are 112 matches out of 65,000 people, it seems like DNA fingerprinting is not nearly as good as we think — but that is largely because we aren’t thinking about the fact that 65,000 people imply 2 billion pairs of people.

Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” 1 in 279.

The bottom line is that DNA testing is not perfect, but it is still a million (or maybe a thousand?) times better than anything else we have to catch criminals and (just as importantly, especially in Illinois) exonerate the innocent.

(Thanks to Dimitris Batzilis for cranking out these numbers.)

TAGS: , ,

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 52

  1. John says:

    “Why not match all 13 loci? Spare no expense and all that…”

    A quick explanation for everyone. What Levitt is saying about the 9 of 13 means that 13 loci were “tested” and that many pairs of people matched at 9 of them, while simultaneously not matching at 4.

    It just like saying your fingerprint is 69% similar to someone else’s. It’s still unique to you, just looks a lot like another guy. Thus there is the possibility for mistaken identity, which is what the last 2 paragraphs are about. However, one should note that the 1 in 279 false positive rate above is only for the 9 loci match, and should be corrected to include all 13 to provide a more honest false positive rate.

    I don’t know anything about the requirements for “identification” or exoneration in a court based on this method, so I can’t speak to those things.

    Thumb up 0 Thumb down 0

  2. Erin says:

    Why would the FBI open themselves up to scrutiny? Our government has made it clear that they do not care if innocent people suffer by getting in the way of
    the law”. I could cite a million cases in which completely innocent people were run over by the wheels of justice – from police work to the courts. I would not be surprised if a large amount of the DNA evidence used in criminal cases turns out to be bunk. The FBI, I am sure, will never cooperate to find out.

    Thumb up 0 Thumb down 0

  3. justin says:

    @1. Yes, and the key in the birthday problem is that we’re looking for the probability of AT LEAST one match. It seems that Levitt is only talking about the “expected” number of DNA matches. The probability of at least one match is likely very large indeed.

    Thumb up 0 Thumb down 0

  4. Jose says:

    Great piece, I’m using it for my stats class this semester!

    Thumb up 0 Thumb down 0

  5. jonathan says:

    Great post, but I think it misses a point about the law. Imagine that you had DNA evidence that said x murdered the victim, but the witnesses say the perpetrator was white and x is black. You wouldn’t get a conviction of x. Heck, you wouldn’t indict the guy.

    The point is that people – and people in the criminal process – assume DNA excludes everyone but the one guy, that it’s absolute, when it isn’t. DNA should be seen in a context of proof: you have witnesses or at least circumstantial evidence that ties the defendant to the scene or at least to the victim so the DNA evidence completes the proof to “beyond a reasonable doubt.” If you have other evidence, then the sheer possibility that yes there’s likely a match somewhere, maybe in this state, maybe in this county, isn’t important. In other words, the real role of DNA is to exclude people and it’s useful as a proof when it fails to exclude the defendant.

    Even CSI ties the defendant to the crime scene or to the victims.

    Thumb up 0 Thumb down 0

  6. BlackPolitical Analysis says:

    This makes a lot more sense. I kept thinking, “But, there are only 6.4 billion people on earth!” Thanks.
    http://blackpoliticalanalysis.com

    Thumb up 0 Thumb down 0

  7. Jen says:

    At least I’m not the only one who’s first thought was the birday coincidence!

    I don’t know the first thing about genetics, but matching against more loci would solve this pretty well. If you double the number of locations that can be matched to achieve identification, the chances of a false match would improove.

    A lesson here – statistical confidence cannot be gained by calculating the number of unique matches.

    Thumb up 0 Thumb down 0

  8. Dan says:

    We don’t need to prove guilt beyond a shadow of a doubt. We need to prove it beyond a reasonable doubt.

    Thumb up 0 Thumb down 0