This month Cybergenetics (the company behind the TrueAllele probabilistic genotyping system) announced the launch of the TrueAllele Investigative Database (TA-ID), a tool built to re-examine DNA evidence that crime labs previously set aside as “inconclusive”: the low-level, partial, degraded, and mixed samples that traditional analysis could not search. The pitch is that thousands of stalled cases could be reopened by converting unusable DNA into investigative leads and cross-case connections.
For prosecutors, that is a headline. For defense attorneys, it is a warning and not really a story about a database at all. It is a story about the continued spread of probabilistic genotyping which is the statistical software that decides whether a messy DNA sample can be tied to a person in the first place. That software is increasingly the foundation of the State’s DNA case, and most jurors (along with a fair number of lawyers) do not understand what it actually does or what its numbers actually mean.
Probabilistic genotyping is admissible in Texas courts. That is not the same as unchallengeable. Here is what every defense lawyer needs to understand.
Probabilistic Genotyping Is Not Traditional DNA Matching
Traditional DNA analysis works cleanly when the sample is clean. A single-source sample containing a healthy quantity of DNA produces a clear electropherogram (a graph of peaks, generally two at each tested STR location) that an analyst can read directly and compare against a reference profile. When the crime scene profile and the suspect’s profile share the same alleles at every location, the analyst reports a match and a random match probability that is often astronomically small.
That clean scenario is the exception in real casework. Most biologically meaningful evidence (touch DNA on a weapon, a steering wheel, or a doorknob; material under a victim’s fingernails; a swab from a contact surface) is some combination of low-level, degraded, and mixed. It contains DNA from two, three, or more people, in unknown proportions, sometimes from unknown sources, and often in quantities so small that the testing instrument itself introduces error.
Two error phenomena matter here. Allele dropout occurs when a real allele is present in the sample but too faint to register as a peak. The profile is missing information that genuinely exists. Allele drop-in is the opposite: an artifact peak appears that corresponds to no one’s true DNA. In a low-template or degraded sample, the analyst cannot always tell which peaks are real, which are missing, and how many people contributed.
For decades, analysts interpreted mixtures by hand, using methods such as the Combined Probability of Inclusion. Those manual approaches were criticized as subjective and, in complex mixtures, unreliable because different analysts examining the same data could reach different conclusions. Probabilistic genotyping software was developed to replace that subjectivity with a mathematical model.
Programs like TrueAllele and STRmix are “fully continuous” systems. They use computer modeling (typically a statistical technique called Markov chain Monte Carlo simulation) to weigh thousands of possible explanations for the observed data, account for dropout and drop-in, and produce a number. That number is not a “match.” It is a likelihood ratio.
The Likelihood Ratio: Where Juries Get Lost
A likelihood ratio answers a narrow, specific question: how much more probable is the DNA evidence if the defendant is a contributor than if an unrelated random person is the contributor instead?
If the software reports a likelihood ratio of one billion, the proper statement is that the evidence is one billion times more probable under the prosecution’s hypothesis than under the defense’s. These numbers can climb into the quintillions, and the prosecution will present them with confidence.
Here is the problem. A likelihood ratio of one billion is not a statement that there is a one-in-a-billion chance the defendant is innocent. It is not a statement that the DNA is the defendant’s. It is not a statement about the probability of guilt at all. It is a statement about the evidence, conditioned on two competing hypotheses and the hypotheses themselves were chosen by people.
Confusing “the probability of the evidence given a hypothesis” with “the probability of the hypothesis given the evidence” is a well-known reasoning error: the transposed conditional, or, in this setting, the prosecutor’s fallacy. A juror who hears “one billion” and thinks “one-in-a-billion chance he is innocent” has committed it. So has a prosecutor who invites that inference in closing argument. A defense lawyer who does not see it coming cannot stop it.
The likelihood ratio is a legitimate and useful tool. The danger is not the math. The danger is the gap between what the number means and what a jury hears.
The Assumptions That Drive the Number
A likelihood ratio is only as sound as the inputs the analyst chose. Three choices in particular can move the result by orders of magnitude.
The number of contributors. Before the software runs, someone must decide how many people contributed DNA to the sample (ex. two, three, four, or more). This is an assumption, not a measurement. The same electropherogram can plausibly support different contributor counts, and the assumed number changes the model and the output. If the analyst assumes three contributors and the truth is four, the resulting statistic may be unreliable.
The propositions compared. The likelihood ratio measures two hypotheses against each other, and the defense hypothesis is itself a choice. “The defendant plus two unknown, unrelated people” produces a different number than “three unknown people, none of them the defendant.” If the comparison is framed in a way that does not match the actual defense theory of the case, the number answers the wrong question.
The population database. The math depends on how common the relevant alleles are in a reference population. Different population databases, and different assumptions about population structure, yield different frequencies and therefore different ratios.
None of this means the software is wrong. It means the number sits at the end of a chain of human judgment calls, and every link in that chain is a place where the defense can probe.
Admissibility in Texas: The Kelly Standard
Texas does not use the federal Daubert test by name. Under Rule 702 of the Texas Rules of Evidence and Kelly v. State, scientific evidence is admissible only if the proponent shows three things: that the underlying scientific theory is valid, that the technique applying the theory is valid, and that the technique was properly applied on the occasion in question.
Probabilistic genotyping has generally cleared the first two prongs. Courts across the country, and in Texas, have largely accepted that the theory and the leading software programs are valid in general. A defense lawyer who walks into a Kelly hearing expecting to exclude TrueAllele outright is usually going to lose.
The third prong is different, and it is where mixture cases are genuinely vulnerable. “Properly applied on this occasion” asks whether this sample (with this much DNA, this many contributors, and this degree of degradation) falls within the range of conditions for which the software was actually validated.
That distinction matters because of what the 2016 PCAST report found. The President’s Council of Advisors on Science and Technology examined probabilistic genotyping and concluded that it was a promising, objective approach but that published validation studies had established its reliability only for a limited range of conditions, such as relatively simple three-person mixtures in which the minor contributor was not a trivially small fraction of the sample. For more complex mixtures (more contributors, smaller fractions, less DNA) PCAST found the validation evidence thin. A sample at the edge of, or beyond, a program’s validated range is a proper-application problem, and it is a Kelly argument the defense can actually win.
Discovery: What the Defense Must Demand
A probabilistic genotyping result is not self-explaining, and the one-page lab report the State produces in initial discovery is nowhere near enough to evaluate it. A defense lawyer confronting this evidence should demand, at minimum:
- The raw data. The electropherograms and instrument data files, not just the analyst’s conclusions. A defense expert needs to see the peaks the software was modeling.
- The bench notes and full case file. Every interpretive decision, including how the number of contributors was determined and by whom.
- The contributor-number determination. The basis for the assumed number of contributors, and whether alternative counts were considered.
- The exact software and version number. Probabilistic genotyping programs are updated over time, and version differences can matter.
- The validation studies. Both the developer’s developmental validation and the specific crime lab’s internal validation and, critically, the range of conditions those studies actually covered.
- The parameters and settings used in this run. The specific configuration applied to this sample.
- All competing or prior interpretations. If the sample was run more than once, or interpreted manually before the software was applied, every result.
- Known limitations, errors, and corrections. Probabilistic genotyping programs are software, and software has bugs. A coding error in one widely used probabilistic genotyping program was discovered years after its release and affected interpretations in numerous cases before it was caught and corrected. A defense lawyer is entitled to ask what is known about errors in the program and version used in the client’s case.
A thorough discovery demand is not a fishing expedition. It is the only way to test whether the State’s headline number rests on sound inputs.
The Source Code Fight
Beyond the case-specific data lies a harder, recurring battle: access to the software’s source code itself. The actual instructions that turn an electropherogram into a likelihood ratio are, in the leading commercial programs, proprietary trade secrets, and the companies that own them have resisted handing them over.
The defense argument is straightforward. If a number generated by a private company’s secret code is going to help send a person to prison, the defense must be able to examine how that code reaches its result to check it for errors, untested assumptions, and design choices a courtroom never sees. Courts have divided. Some have treated the developer’s general validation as sufficient and declined to order disclosure. Others have ruled the opposite way: the New Jersey Appellate Division, in the closely watched 2021 decision State v. Pickett, held that a defendant was entitled to access TrueAllele’s source code under a protective order, recognizing that the defense cannot meaningfully challenge what it cannot inspect.
The law here is unsettled and jurisdiction-dependent. But the issue should be raised. A vendor’s trade-secret claim is not a constitutional answer to a defendant’s right to confront the evidence.
Confrontation and the Analyst
The software is not a witness. The analyst is.
The Confrontation Clause guarantees a defendant the right to cross-examine the witnesses against him, and the Supreme Court has applied that right squarely to forensic analysts in Melendez-Diaz v. Massachusetts and Bullcoming v. New Mexico: the State cannot prove a forensic result through a report alone, or through a substitute witness who did not do the work. The analyst who performed the interpretation and reached the conclusion can be required to take the stand and submit to cross-examination.
The Supreme Court’s 2024 decision in Smith v. Arizona sharpened this further. Smith addressed the common practice of putting a different expert on the stand to relay an absent analyst’s work as the “basis” for an opinion. The Court held that when a prosecution expert conveys an absent analyst’s out-of-court statements and those statements are offered for their truth, the Confrontation Clause is in play. For probabilistic genotyping, that means the State should not be able to launder an absent analyst’s interpretive choices through a surrogate witness. The person who made the judgment calls (number of contributors, propositions, parameters) should be the person the defense gets to question.
What This Means for Texas Defendants
The launch of TA-ID is a signal. Probabilistic genotyping is moving from a tool reserved for the hardest cases to an everyday feature of DNA prosecution, and products built specifically to re-examine “inconclusive” evidence will only push more borderline samples into court. Defense lawyers will encounter probabilistic genotyping results not only in homicide and sexual assault cases but across the felony docket.
The evidence is challengeable but not by a lawyer who treats the State’s number as settled science. It is challengeable through the third prong of Kelly, by showing the sample fell outside the software’s validated range. It is challengeable through discovery rigorous enough to expose the assumptions behind the number. It is challengeable through the source-code question and through the Confrontation Clause rights that attach to the analyst. And it is challengeable, in front of a jury, by making plain the difference between what a likelihood ratio says and what it does not.
Doing that requires a lawyer who is comfortable in the underlying science. Deandra Grant holds a Master of Science in Pharmaceutical Science, a Graduate Certificate in Forensic Toxicology, and the ACS-CHAL Forensic Lawyer-Scientist designation. She also completed Dr. Greg Hampikian’s DNA for Lawyers professional workshop at Boise State, including hands-on DNA analysis.
A probabilistic genotyping result can look, to a jury, like certainty reduced to a single overwhelming number. It is not. It is a model, built on assumptions and every assumption is a question a defense lawyer is entitled to ask.
Deandra Grant Law handles DNA and forensic evidence challenges in criminal cases throughout North and CentralTexas. Call (214) 225-7117 for a free, confidential consultation.