Which picture comes to your mind if you think of the 'Tree of Life' (ToL) ? Maybe it's Ernst Haeckel's venerable oak tree (see Pictures Considered #45 ) ? Or is it rather Carl Woese's famous "three domains" phylogenetic tree (Figure 1) ? Although highly abstract, the latter still resembles a tree with stem and branches, at least as much as does the famous "I think" sketch by Charles Darwin, on page 36 in his Notebook B. Yet despite their similar shapes the 'trees' depict different things: both Darwin and Haeckel thought about the relationship among species, while Woese visualized the similarity among ribosomal RNAs. This is an 'organismal tree' versus 'gene tree' problem, and the 'versus' is not even easily (re)solved by multi-gene phylogeny using complete genomes because of gene duplications, gene loss, and the tremendous level of 'noise' contributed by horizontal gene transfer (see STC's 3-part excursion to HGT here, here, & here ). But there's even more to it, which I will deal with in a minute.
First I'd like to deviate to the 'new view of the tree of life' by Hug et al. that is simply too expansive for displaying on this webpage even though it names only phyla and none of the lower taxonomic ranks. So, even if you had read Roberto's recent celebration of Carl Woese, which refers to it, please take a short break from reading-on and study it here. To me, it doesn't look like a 'tree' at all but more like fireworks in full display, after an explosive discharge from a rocket at one point, a snapshot of today's diversity of living things in all its grandeur. By mere quantity, Bacteria dominate the fireworks, and they do so not least because the very recently detected 'candidate phyla radiation' (CPR) contributes roughly 1/3 to the known bacterial phyla (see here in STC for a portrait of the 'Lilliputians' ). Note that Hug et al. avoid indicating the 'root' of their 'tree', the take-off point of their fireworks rocket, so to say. They thus evade the thorny discussion of LUCA, the 'last universal common ancestor' (aka the cenancestor ).
So, then, what's problematic with Woese's tree ? I start with the 'stem' plus the lengths of the first branches, and will later come to the 'root'. Let's first consider the branch lengths between Bacteria, Archaea and Eukaryotes. In fact, most phylogenetic trees – including the 'new tree' by Hug et al. – that are calculated for various markers (=genes ) show these long and empty branches that connect the bacterial, archaeal and eukaryotic domains. Several hypotheses have been proposed to explain this. They are mostly based on assumptions of evolutionary mechanisms that differed from those we can observe today. In their study from 2004, Olga Zhaxybayeva and Peter Gogarten demonstrated by an elegant simulation that these long, empty branches can also be a consequence of the commonly accepted and routinely applied methods to tackle gene phylogenies of extant lineages, the 'lucky survivors'. Briefly and without diving into mathematics/statistics, one method for building a gene tree starts with a distance matrix that is calculated for orthologous markers in the lineages in question (based on the nucleotides for rRNA, on the amino acids for proteins ). A cluster analysis is subsequently performed on the distance matrix with statistical tools developed for the coalescence theory (which discusses the coalescence of alleles in a population to their common ancestral allele ). Any type of cluster analysis is hampered by the fact that the outcome depends crucially on the starting point (=the first two samples taken from the distance matrix to form a cluster ). Therefore, phylogeneticists have agreed to perform multiple runs with different starting points (up to 1,000; kudos to the computers for crunching numbers for hours on end ) of one type of cluster analysis on a given distance matrix to obtain 'bootstrap values' (=quantified confidence ) that indicate how often a particular 'tree' could be reproduced (the bootstrap values in the Hug et al. tree, for example, varied between >55% and >85% ). A characteristic of the coalescence processes is the exponential increase in distances required to cluster ever more distant sub-clusters, that is, to finally 'meet' the most recent common ancestor. Figure 2B shows the result for a simulation of n = 100 lineages. Note the bare branches close to the ancestor and their lengths (an interesting aside: the repetition of the overall branching pattern is largely reflected in the sub-clusters, which is typical for fractals ). Graphic representations of such trees usually give the number of substitutions per unit distance, that is, not on a time scale. Projecting a phylogenetic tree onto a time scale can be done – for example, in order to state that Salmonella diverged from Escherichia coli 120 – 160 million years ago – but always relies on estimates of mutation rates. Therefore, the evolutionary distance of two species from their common ancestor cannot be simply calibrated back in time from the number of substitutions in a given marker. Add to this the varying generation times and the evolutionary pressure in a given ecological niche at a given time period, and you understand that every 'phylogenetic tree' is at best a blurry picture, comparable to viewing Denali (Mt. McKinley) from the distance of, say, Anchorage, AK (which is apparently possible ).
Let us now consider the no-less problematic 'root' in Woese's tree. In their paper from 2004, Olga Zhaxybayeva and Peter Gogarten summarize the consensus among virtually all researchers working in the forest of life: "All extant cellular organisms that are known today share a multitude of characteristics including the use of DNA as genetic material, energy-coupling membranes and template-directed protein synthesis using ribosomes". These shared complex properties suggested to most researchers that a cenancestor existed, lived, at a stage much later than the 'origin of life' (defined as replicating molecular assemblies apt to drive metabolic processes later to be incorporated into cells ). It seemed reasonable to assume that at that point in time multiple lines of molecular descent coexisted. However, the unresolved questions are whether such a cenancestral cell was more similar to extant Bacteria, to extant Archaea or to extant Eukarya sans mitochondria, whether it was something entirely different, or whether there was even more than one type of progenitor cell. Gene phylogenies give conflicting results, and the number of presently known gene families is – even if we are ignorant of the possibly redundant function(s) of a vast number of ancient genes – too large to fit as single-copy genes into a single progenitor cell (whatever its genome organization would have been ).
I should now introduce another simple simulation by Zhaxybayeva and Gogarten. They allowed 10 'lineages' to evolve over time starting at T = 0, with the rule that randomly for every generation one lineage split (=speciation ) and another lineage went extinct (Figure 2A). Additionally, they allowed one HGT event per ten speciation events. To simplify tracing the molecular histories they imposed the following restrictions: 1. the recipient lineage should not become extinct during the same time interval, and 2. the donor lineage should not undergo a simultaneous speciation event. The process was terminated at time T = t when all extant lineages had one common ancestor. The extant lineages were then traced back to their most recent common ancestor as were the extant HGT-derived genes traced back to their most recent common ancestor. In subsequent simulations they increased the number of lineages from 10 to 100, and they found in all cases that the most recent common ancestor for the molecular phylogeny of a HGT marker was different from the cenancestor of the organismal lineage, and the time at which these last common ancestors (molecular and organismal ) existed also differed. These simulations strongly support the notion that it is pretty hopeless to identify the putative organismal cenancestor by gene phylogenies (even when, conceptually, an organismal lineage would be defined by the majority of genes passed-on vertically over short time intervals ), and pinpointing the time period when it might have lived.
I promised in the title of this post a recipe 'To Not Get Lost in the Trees of Life'. To deliver, I introduce here another simulation by Zhaxybayeva and Gogarten (the last one in this post ). This simulation of a 'tree of life' (Figure 3) builds on the simple null hypothesis that after an initial phase of diversification the speciation rate is, as above, equal to the extinction rate (which, as they readily admit, is an over-simplification ). They derive a tree without invoking geographic barriers within the 'primordial soup' or evolutionary bottlenecks caused by catastrophic events in Earth's history (for example, meteorite impacts or snowball Earth events ). A tree that unfolds during an initial phase of diversification and that assumes, by balanced speciation and extinction, a geometry or shape that is fairly similar to the Hug et al. tree. The use of information-carrying aperiodic biopolymers as genetic material (which might or might not have been a unique event ) during the prebiotic phase was a key step towards 'cellularization', and selection was likely to have generated different quasi-species that occupied the available ecological niches (I'm aware that zoologists use the term 'cellularization' for the process of forming single cells from a multinucleated syncytium, but that's not so terribly different from forming cells from a prebiotic soup, after all ). They assume that there were numerous independently arising lineages (=replicating! ) of autocatalytic chemical reactions and networks during prebiotic evolution, which went extinct or became successively incorporated into evolving cells (and maybe viruses ? ). The arrows in Figure 3 indicate the prebiotic reaction networks that contributed to the diversifying lineages and, after 'cellularization' to the diversification of the first cell lineages, a tradition later continued as what we now call HGT. In this scenario HGT and its (as yet nameless ) predecessor is not a confusing nuisance (as it is for phylogeneticists trying to distinguish gene phylogenies from organismal phylogenies ) but necessary to let cells start lineages with slim genomes. In their simulated tree, Zhaxybayeva and Gogarten place 'the cenancestor' well after the first period of diversification of already existing cell lineages near to the "event horizon" of balanced speciation/extinction. I can hardly wait to learn which surprises a de-simplification of this simulation approach will reveal! Let the experimentation begin...
Legend to the Hug et al. tree linked in the text (2nd paragraph): Expanded view of the tree of life, showing that bacteria make up two-thirds of all Earth's biodiversity, half of that from uncultivable bacteria called 'candidate phyla radiation' (CPR). The Archaea and Eukaryotes makes up another third. The red dots represent lineages that cannot, at present, be isolated and grown in the lab. Tree-building was done with concatenated ribosomal proteins (=gene tree ). CC BY 4.0 Jill Banfield/UC Berkeley, Laura Hug/Univ. of Waterloo. Source (Open Access PDF here)