Molecules as documents of history
Some 50 years ago, Emile Zuckerkandl and Linus Pauling proposed that molecules could be read as "documents of evolutionary history". The molecules to which they referred were proteins, and the sequence of the amino acids that specified each. Sequences are expected to be more similar in closely related, therefore recently diverged, species than in species that diverged long ago. This idea forms the basis for molecular phylogenetics – the use of differences in protein, DNA or RNA sequences to see how a set of taxa are related to one another. Evolutionary relationships can be depicted by a tree diagram, much like a family tree, where branches diverge at common ancestors, and sequences sharing the same common ancestor cluster together.
At the dawn of molecular phylogenetics, some scientists (Ernst Mayr, among others) doubted that homologs (genes related by ancestry) would retain enough similarities between distant organisms such as mouse and man to yield any valuable information. These doubts were soon proven mostly groundless, although different types of sequences indeed have different rates of change. Choose wisely, and you can look back far into the past, or dally about in the near present. "Information processing" genes – those that encode factors involved in transcription or translation – are so highly conserved that they can be compared across life's two (or three) domains. Genes involved in immunity, by contrast, evolve very quickly due to constant one-upmanship with pathogens.
DNA as documents of crime
Today the vast majority of sequencing is done on DNA, and RNA transcripts (the transcriptome), and it's easy to forget that the first sequences to be determined and compared phylogenetically were of proteins. In fact, the order in which sequencing techniques were developed goes from protein, to RNA, and finally to DNA. Since this is the opposite of the order of information flow defined by Crick's "Central Dogma", I like to present this to my genomics students as the "Central Catma". (Interesting historical note: Crick later admitted that he did not really understand the exact meaning of dogma, "beyond questioning", when he coined the term!).
By the 1980s, DNA molecules were also serving as documents of a more mundane history – paternity. Since assessing paternity is, after all, a case of "whodunnit", it didn't take a huge conceptual leap to imagine DNA's potential to serve as documents of crime. And who does whodunnits better than the Brits! So it is appropriate that one of the earliest criminal cases solved by DNA evidence took place in the UK, in the 1980s. The case involved the rape and murder of two 15-year-old girls. A young man with mental disabilities confessed to the crime, but his DNA did not match DNA obtained from semen on the corpse. Being stoked on the potential of this new scientific technique and unwilling to give up so easily, authorities collected DNA samples from all the men of the town. However, a certain "Colin Pitchfork" convinced a friend to submit a sample in Colin's name. Later this ruse was revealed when the friend became a bit loose-lipped at the pub and a good citizen overheard. Colin was arrested and went on to infamy as the first person convicted of murder by DNA evidence when his DNA was matched to the crime scene samples (Fig. 1). That first case highlighted how forensic evidence can swing both ways: exonerating someone accused, even when the person had confessed; or nailing someone to whom no other evidence pointed.
The Pitchfork case did not involve phylogenetics, but instead DNA fingerprinting: determining the highly individualized patterns of variation that distinguish one person's genome sequence from another's. Slowly-mutating organisms like us retain basically the same DNA fingerprint throughout our lives, so the information revealed is static, serving as a type of calling card inadvertently left at the scene of the crime. But here we discuss the small things – in this case, viruses.
As it happens, the first forensic use of phylogenetic analysis involved a small thing known as the human immunodeficiency virus (HIV). Is this just lucky coincidence for those of us who like to talk about viruses? No. In order to reap information from molecular documents of history, there has to be an adequate number of changes to the sequences being compared, over the relevant time frame. HIV acquires substitutions at a rate of approximately one change per 10,000 nt per replication cycle. Since the HIV genome is ~10,000 nt (of RNA), there is roughly one change in each newly replicated genome. That's enough changes to give useful information about the chain of infection from person to person. So rather than a calling card, phylogenetics reveals a slimy trail of transmission.
You can't hide among them trees
The case involved a University of Florida student, Kimberly Bergalis, who allegedly had no known risk factors to HIV exposure yet began experiencing frequent fevers, fatigue, fainting, and thrush. Her mother, who was a nurse at a clinic treating STDs, recognized the panel of symptoms as those typical of acquired immunodeficiency syndrome (AIDS), suffered by HIV-infected patients. She insisted to doubtful doctors that Kimberly be given an HIV test – which came up positive. Kimberly was at a loss to explain how she was exposed to HIV. The only possible exposure she could think of was when she had two teeth pulled by the family dentist – who had since sold his practice and was dying of AIDS
The incident garnered considerable media attention at a time (late 1980s) when fear and confusion about AIDS were at a peak in the U.S. (Fig. 2). Since the group most affected by the syndrome at that time was the gay male community, stigmatization was also rampant. Understanding of how HIV was transmitted was still fuzzy: yes, it spread mostly through blood, but also through semen. But could it be spread by saliva? Tears? Sweat? Many people were terrified of this newly emerged and deadly menace.
Health care workers were at the center of many of these concerns. Cases had already occurred in which medical workers had been infected by patients with HIV. But these situations usually involved exposure to large amounts of blood from infected patients. A question as yet unanswered was, how likely was an infected health care worker to infect a patient?
The Bergalis case was taken up by the Florida Dept of Health, and later by the Centers for Disease Control (CDC). The dentist wrote an open letter advising all of his patients to be tested for HIV. Eventually, seven former patients emerged who had tested positive. In order to establish whether the dentist had been the source of the virus that infected his patients, a group of labs collected virus samples from the dentist, each of the infected patients, and 35 HIV-positive people from the region where the dentist had worked ("LC", local controls). They sequenced a portion of a gene that varies among HIV strains. By aligning the sequences and comparing them, they could discern groups that shared many of the same nucleotide changes from those that shared few of the same changes. Presumably, the viruses with nearly identical sequences – in other words, very little variation from one another – shared a more recent common ancestor and were part of the same transmission chain.
When the virus sequences from the dentist were compared with those from his patients and those from local controls, five of the seven patient sequences showed only 3.4 to 5% variation with the dentist sequences, compared to 11 to 12% variation with local controls (Fig. 3, top). These five patients had claimed to have no HIV risk factors, and each had undergone invasive surgery by the dentist. The virus sequences from the other two patients, however, showed the same range of variation with the dentist as with local controls – roughly 11 to 13%. These two patients had previously known risk factors for HIV. A phylogenetic tree generated from an alignment of the sequences depicted the relationship visually (Fig. 3, bottom). This type of analysis is standard fare in today's research publications, but at that time the methods were so new that the authors felt the need to explain in the legend a detail that goes without saying today: that the "vertical distances are for clarity only", and contain no phylogenetic information (Fig. 3, bottom).
In the Florida case, it was unclear how transmission occurred. The dentist died in September 1990, and Kimberly died just over a year later. Importantly, the phylogenetic evidence did not prove that the dentist was the source of the virus, only that the evidence was consistent with that scenario. In the nearly three decades since, there have been no other documented cases of health care workers having infected patients in the U.S., although there is evidence of two such cases in Europe, each involving invasive surgery – a hip replacement in France, and a caesarean section in Spain.
Murder by virus
The first admission of phylogenetic analysis as evidence in a criminal trial occurred in 1998 and helped incriminate a doctor accused of using viruses as a murder weapon. The case involved a jilted lover (the doctor) who plotted revenge by requesting blood work for two patients he knew to be infected: one with HIV, the other with the hepatitis C virus (HCV). However, the blood samples never made it to the lab. Instead, he mixed them into a deadly serum and made a supposedly friendly stop at his former girlfriend's home. In the past he had given her vitamin injections, and now he claimed he wanted to give her just one more, for old times' sake (a true romantic). The sequences of the victim's viruses were nearly identical to sequences from the infected patient, while sequences from viruses isolated from local controls fell outside this cluster. Interestingly, particular mutations in the reverse transcriptase enzyme of HIV were those commonly seen as a response to AZT treatment, suggesting that the founding virus came from someone who was on anti-retroviral therapy (as was the unwitting donor of the virus). The doctor was sentenced to 50 years in prison and was recently denied parole.
A high-profile case that turned on a slightly different use of phylogenetic methods involved six foreign aid workers at a children's hospital in Libya. An outbreak of HIV and HCV infections among the children was attributed to the evil deeds of five Bulgarian nurses and a Palestinian doctor. The group, known as the "Tripoli Six", was imprisoned, and confessions extracted through torture, leading to death sentences for all. Phylogenetic analyses performed by European scientists revealed that the HIV viruses clustered into one related group, and HCV into three main groups, with all of the strains endemic to the region. While this finding was consistent with a hospital-spread transmission originating from a local strain, it did not by itself do much to persuade the Libyan authorities, who had already portrayed the workers as evil incarnate in the press (Fig. 4). However, by analyzing the amount of sequence divergence present among the viruses isolated from dozens of children, the researchers were able to show with very high statistical confidence that the most recent common ancestors for the HIV cluster and the three HCV clusters all arose well before March 1998 – the date the medical workers had arrived in Libya (Fig. 5). Although this evidence was not admitted in a trial, it was thought to sway authorities to agree to extradite the prisoners to Bulgaria, where they were immediately pardoned by the Bulgarian president. A detail that often goes unmentioned in this story is that it was Muammar Gaddafi's son, Saif al-Islam, who reached out for help from the European researchers in determining the likelihood that the infections had been intentional.
A view of the forest
As new cases involving phylogenetic forensics continue to crop up, so do concerns over its limits, potential misuse, and the need for standards. Some researchers are troubled by the potential criminalization of unintentional HIV transmission, for instance, and the use of phylogenetics in murder convictions in such cases. Such concern is compounded by the fact that there is still debate in the field over interpretation of results. There is the risk of an ambitious prosecutor or defender implying to jurors that phylogenetic results have the same high statistical confidence as the DNA fingerprinting analyses we've all come to know (preferably only through crime dramas). Yet phylogenetic analyses cannot directly point a finger, but only raise a hairy eyebrow toward innocence or guilt. However, the use of phylogenetic forensics is on the increase in civil and criminal trials – pity the person tasked with explaining the nuances of these techniques to jurors, judges, and attorneys.