A Comment on the Reliability of Molecular Trees

R. H. Zander

Res Botanica, Missouri Botanical Garden

January 31, 2010


return to home


A Comment on Reliability of Molecular Trees

Richard H. Zander


Although reliability (as opposed to statistical discriminatory power) of a molecular tree may be estimated in spite of sampling error, incorrect model of sequence evolution, greatly different rates of divergence and extinction, and other problems (Zander 2007a), one might keep in perspective the finding of Guigó et al. (1996) that of 53 different nuclear genes only 17 were perfectly consistent with the accepted species tree of major eucaryote groups, while Chen and Li (2001) found that of 53 different DNA loci, 31 support the Homo-Pan clade, 12 support Pan-Gorilla, and 10 support Homo-Gorilla (although this leaves a chi-square probability of 0.999 that this would not occur by chance alone given random support for all three combinations). Five different phylogenies of mouse, rat, human, seal, cow and whale were supported by one or another of 15 different mitochondrial genes in a study by Årnason and Johnsson (1992). Problems in discrepant gene histories are well known, and are discussed at length most recently by Avise and Robinson (2008) and Duvall et al. (2008). Also, molecular traits may be to a significant extent non-neutral and thus affected by selection. Although protein-coding DNA comprises only two percent of the human genome, of 44 regions studied including 30 million bases, fully 80 percent of the bases were apparently involved in some way in expression of traits, such as gene regulation (Pennisi 2007), and therefore exposed to selection that may lead to false DNA sequence convergence in phylogenetic analysis. Stern and Orgogozo (2008) found that fully 22 percent of identified genetic changes are due to cis-regulatory mutations, which are largely found in non-coding sequence areas, these commonly used in phylogenetic analysis of DNA. Certainly extreme branch length heterogeneity, like that expected for punctuated evolution, can affect the recovery of the true gene tree (Lyons-Weiler and Takahashi 1999). Pollard et al. (2006) suggested that rapidly evolving regions are adaptively significant and should be under positive selection. Yi (2007) summarized evidence for “pervasive natural selection on non-coding and synonymous sites” leading to, for instance, rapid adaptive evolution and accelerated molecular clocks in particular lineages. The molecular analysis must also have accounted for any homoplasy introduced into the analysis by inappropriate technique, e.g. wrong model (Alfaro and Huelsenbeck 2006) or inappropriate data, e.g., incomplete concerted evolution (Doyle 1996).


Alfaro ME, Huelsenbeck JP (2006) Comparative performance of Bayesian and AIC-based measured of phylogenetic model uncertainty. Syst Biol 55:89–96

Årnason U, Johnsson E (1992) The complete mitrochondrial DNA sequence of the harbor seal, Phoca vitulina. J Mol Evol 34:493–505

Avise JC, Robinson TJ (2008) Hemiplasy: a new term in the lexicon of phylogenetics. Syst Biol 57:503–507

Chen FC, Li WH (2001) Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Amer J Human Gen 68:444–456

Doyle JJ (1996) Homoplasy connections and disconnections: genes and species, molecules and morphology. In: Sanderson MJ, Hufford L (eds) The recurrence of similarity in evolution. Academic Press, New York, pp. 37–66

Duvall MR, Robinson JW, Mattson JG, Moore A (2008) Phylogenetic analyses of two mitochondrial metabolic genes sampled in parallel from angiosperms find fundamental interlocus incongruence. Amer J Bot 95:871–884

Guigó R, Muchnik I, Smith TF (1996) Reconstruction of ancient molecular phylogeny. Mol Phylog Evol 6:189–213

Lyons-Weiler J, Takahashi K (1999) Branch length heterogeneity leads to nonindependent branch length estimates and can decrease the efficiency of methods of phylogenetic inference. J Mol Evol 49:392–405

Pennisi E (2007) DNA study forces rethink of what it means to be a gene. Science 316:1556--1557

Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, Pedersen JS (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443:167–172

Stern DL, Orgogozo V (2008) The loci of evolution: how predictable is genetic evolution? Evolution 62:2155–2177

Yi SV (2007) Understanding neutral genomic molecular clocks. Evol Biol 34:144–151


See also relevant comments on ancient paraphyly in Evolutionary Systematics page.