Unaccounted Assumptions
Richard H. Zander
Res Botanical Web Site
Missouri Botanical Garden

December 22, 2005

http://www.mobot.org/plantscience/ResBot/Phyl/Unaccounted.htm

 

home

unaccounted assumptions

R. H. Zander

 

There are many assumptions having to do with regularity and sample error significantly affecting the reliability of phylogenetic analysis, even purportedly Bayesian in nature, that are commonly ignored or incorrectly passed off as trivial in the speculative literature. The tree itself is a branching series of nested sets (e.g. the set of all taxa exhibiting certain state changes). A set may appear to be more definite a concept than a sample, yet it cannot be any better than the samples included in it, and a set is itself a sample.

 

As pointed out by Posada and Buckley (2004), statistical analysis commonly makes three fairly straight-forward but not easily checked assumptions:  that data sets are drawn from the same underlying process, that sample size is large enough to obtain meaningful results, and a multivariate normal distribution is involved.

 

Huelsenbeck and Rannala (2004), however, echo a common viewpoint in the literature downplaying the effect of many possible additional assumptions asserting that “the posterior probability of a tree is the probability that the tree is correct (assuming that the model is correct).” The likelihood principle, which states that the likelihood function contains all the information from the sample that is relevant for inferential and decision-making purposes (Winkler, 1972), is in this manner misused.

 

Some authors detail how their own study is robust to variation in certain major assumptions, but this is usually restricted to model selection and sequence alignment, while “robust” is never quantified probabilistically. Some recent papers (e.g. Engstrom et al., 2004) attempt to explore various dimensions of uncertainty aside from the analytic algorithm, but these papers are few, and the analysis is complex, and probabilities of less than 0.50 certainty generally are not precisely calculated and made to affect reliability measures given by other genes.

 

As few as six external factors that affect the reliability of internode branch arrangements each at 0.99 chance of being correct will reduce confidence in each branch arrangement to a maximum of 0.94 probability (as the product). Only if morphology agrees with the arrangement and is “uncontested” can it be used as a prior (here done at an arbitrarily assigned 0.95) that will ensure reliability of at least 0.95 via Bayes' Formula (if the reliability of the agreeing molecularly based branch arrangement is less than 0.95 and greater than 0.50).

 

Below is a list of presuppositions (variously discussed in general by, among others, Avise, 1994; Felsenstein, 2004; Huelsenbeck et al., 1994; Jenner, 2004; Kolaczkowski & Thornton, 2004; Lipscomb et al., 2003; Lyons-Weiler & Milinkovitch, 1997; Maddison, 1996; Naylor & Adams, 2003; Philippe et al., 1996; Pickett & Randle, 2005; Rokas et al., 2003; Ronquist, 2004; Ruedas et al., 2000; Sites et al., 1996; Templeton, 1986; Wendel & Doyle, 1998; Wilcox, et al., 2002). These can be important but are commonly not factored in, and this is especially true in the older literature. Some are obvious and major problems, and some are cryptic to the non-adept, or merely minor, or inapplicable to particular loci.

 

There are doubtless other factors, and each may affect the reliability of a branch arrangement of interest as the product of the confidence interval assigned the internode times the probability that each and every particular assumption is correct. It is doubtless possible to assign particular probabilities to at least some if not many of the assumptions below for particular data sets, but that task is beyond the scope of this paper. Commonly unaccounted (unfactored) assumptions or problems that could require reduction in branch arrangement reliability are included in the following several categories:

 

1. alternative alignments of DNA sequence data, including alignment by eye or computerized optimization for best fit; mistakes in assignment of homology of morphological characters (Hickson et al., 2000; Page, 2004; Wheeler, 1994, 1999);

 

2. avoiding using introns or especially emphasizing them for sometimes conflicting technical reasons (Pons et al., 2004; Engstrom et al., 2004);

 

3. BPP not lowered in a second study when reliability values in a previous study of less than 0.50 for the same lineages could be used as priors;

 

4. hybridization or reticulate evolution, unbalanced gene flow during introgression, gene conversion, chloroplast capture, paralogy or gene duplication (occasionally between organelles), conflation with orthology, recombination, heteroplasmy, haplotype polymorphism (Doyle et al., 2004; Holder et al., 2001; Jackson, 2005; Mason-Gamer, 2004; Popp & Oxelman, 2004; van Oppen et al., 2001; Wolfe & Randle, 2004);

 

5. clade probablilities not equal a priori (Pickett & Randle, 2005; Randle et al., 2005);

 

6. clocklike behavior or lack thereof, use of optimal model parameters in likelihood ratio test of molecular clock, use of nonparametric rate smoothing (Bromham & Woolfit, 2004; Sanderson, 1997);

 

7. concerted evolution (Nei et al., 2000; Popp & Oxelman, 2004);

 

8. convergence due to environmental selection on morphology or exons, assumed “neutral” mutations influenced by evolutionary pressures (Caporale, 2003; Doebley & Lukens, 1998; Rodriquez-Trelles et al., 2004; Zang & Kumar, 1997);

 

9. differences between consensus trees, best trees and true trees (Pagel et al., 2004);

 

10. differences between results of total evidence (combined data sets) and repeatability of results using separate gene and morphology evaluations, novel clades, using or not using different gene data at different levels in the tree; accepting data sets as compatible if non-corresponding clades lack a BP value >50% for each of two data set); random generation of traits shared by two sister lineages that is difficult to distinguish statistically from similar parallelism between each of the two sister lineages and the nearest neighbor lineage (Ané & Sanderson, 2005; Benton, 1999; Buckley et al., 2002; Chen et al., 2003; Eernisse & Kluge 1993; Engstrom et al., 2004; Johnson & Soltis, 1998; Nixon & Carpenter 1996; Nylander et al., 2004; Olmstead & Scotland, 2005; Scotland et al., 2003)

 

11. different results from different iterations, generations and replications of analysis processes, including Dollo or transversion parsimony, and ordered or unordered states, insufficient mixing and convergence of MCMC chains (Randle et al., 2005);

 

12. different results from parsimony, neighbor joining, maximum likelihood and Bayesian methods, or from the many different phylogenetic analytic software packages commonly used in the past 20 years, including ability to find shortest trees or proper trees in 0.95 credible interval, limited available selections of models or weighting (Douady et al., 2003; Felsenstein, 1978; Mindell & Thacker, 1996; Randle et al., 2005; Sober, 2004);

 

13. differential lineage sorting, i.e., different gene histories (Doyle, 1992, 1993; Hudson, 1992);

 

14. effect of uncertainty contributed by more or fewer taxa included in the data set, or the use of exemplar taxa to represent larger taxonomic units with presumably insignificant variation in traits among taxa, or the effect of inclusion or exclusion of problematic taxa, or selection of different or multiple outgroup(s); data per taxon sample size (Funk et al., 2004; Graybeal, 1998);

 

15. effect of under- and over-credibility of Bayesian analysis, Bayesian priors over-determining results with small data sets, extended lineages that are represented in a cladogram simply by an outer node may render the analysis imprecise because these sequences are unknown (Alfaro et al., 2003; Bininda-Emonds, 1996; Bollback, 2004; Churchill et al., 1992; Lewis et al., 2005);

 

16. genomic problems including differences between nucleotide- and amino acid-based analyses; codon bias in exons; reversal of asymmetric mutational constraints of strand nucleotide composition bias in mtDNA; possible strong selection pressure on strongly conserved non-coding sequences and persistent pseudogenes; limitations on congruence of orthologues; re-expression of pseudogenes; regulator- or promoter-switched deep homology masquerading as homoplasy (convergence); endogenous retroviruses causing portions of genome to appear to have a different evolutionary history (Bapteste et al., 2005; Barbulescu et al., 2001; Christianson, 2005; Collin & Cipriani, 2003; Hall, 2003; Hassanin et al., 2005; Hollyoake et al., 2005; Inagaki et al., 2004; Lockwood & Fleagle, 1999; Rohwer & Rudolph, 2005; Rokas et al., 2003).

 

17. heterogeneity of models among sites, heterogeneous evolutionary processes over phylogenetic history, nucleotide composition not constant over time (Goldman, 1993; Kolaczkowski & Thornton, 2004; Pagel & Meade, 2004; Tuffley & Steel, 1997);

 

18. inclusion or exclusion of fossil evidence (Smith & Turner, 2005);

 

19. incongruence, sometimes well supported, between mitrochondrial, chloroplast and nuclear data sets (Cronn et al., 2002; Des Marais & Mishler, 2002; Sang & Zhong, 2000; Shaw, 2002; Steppan et al., 2004; Wendel & Doyle, 1998);

 

20. inconsistent method leading to high bootstrap support for an incorrect clade (Cummings et al., 2003);

 

21. method of incorporation of indels and the effect on arrangements of interest, different gap costs (e.g., Pons et al., 2004; Simmons & Ochoterena, 2000);

 

22. model selection choice type and procedures, including amino acid and secondary structure, homogeneous versus heterogeneous models, choice between Bayesian or Akaike information criteria, too few data to ensure accuracy of likelihood ratio test, i.e. likelihood curve not shaped like a normal distribution, using 0.95 as significant for LRTs (Bollback, 2002; Buckley, 2002; Buckley et al., 2002; Pol, 2004; Posada & Buckley, 2004; Randle et al., 2005);

 

23. selecting and reusing data from taxa previously grouped by random rather high bootstrap or posterior probability; multiple test problems, e.g., one branch arrangement contrary to tradition among 20 arrangements each at 0.95 probability; (Felsenstein, 2004);

 

24. possibility of horizontal gene transfer (Davis & Wurdack, 2004; Nickrent et al., 2004);

 

25. rates other than gamma-distributed (Felsenstein, 2004; Pagel et al., 2004);

 

26. reliability values differing by method or only comparable between similarly sized clades (Picket & Randle, 2005; Sanjuán & Wróbel, 2005);

 

27. results affected by inclusion or exclusion of 3rd nucleotide position, high evolutionary rates making sequences unreliable, saturation, compositional heterogeneities, among-lineage and among-site heterogeneities, invariant sites, covariation, non-independence of characters, self-correction of flawed DNA; AFLP markers limited by unequal gain-loss probability, possible lack of independence, possible lack of homology (Engstrom et al., 2004; Ho & Jermiin, 2004; Koopman, 2005; Steppan et al., 2004; Sullivan & Swofford, 1997);

 

28. sample error, including misidentifications, uncertainty due to lack of vouchers, reagent contaminants, unreliable primers, laboratory mistakes, capture of data, software bugs, confirming DNA sequences by analysis of both forward and reverse strands or two different reactions from same individual; confirmation bias (the tendency to selectively notice and focus on evidence that supports a theory rather than on facts that might disprove it) (Bridge et al., 2003; Engstrom et al., 2004; Funk et al., 2005; Popp & Oxelman, 2004; Steppan et al., 2004; Vilgalys, 2003);

 

29. sample size of DNA sites;

 

30. serial extinctions of sister groups or strong anagenetic change modifying ancestral characters, variation in speed of molecular evolution or speciation versus variation in generation times;

 

31. uncertainty contributed by conflicting morphological results, statistical rejection of  morphological alternative topologies by the molecular and vice versa (Collard & Wood, 2000; Kirchoff et al., 2004; Steppan et al., 2004);

 

32. uncertainty introduced by choice of ACCTRAN and DELTRAN with PAUP* or rejection of both with MacClade (Donoghue & Ackerly, 1996; Maddison & Maddison, 1992; Swofford, 1998);

 

33. under- or overspecification or parameterization of the model, limitation of Metropolis coupling (Alfaro et al., 2003; Ericksson et al., 2003; Huelsenbeck & Rannala, 2004; Pagel et al., 2004);

 

34. unexpected stochastic effects, such as bad luck in exemplar choice, long-branch attraction, unusual noise (Hillis, 1991);

 

35. weighting inappropriately or variously, doubt in any rescaling or re-optimization, mistakes in use of statistics, use or non-use of “weeded” parsimony; trees not derived independently of the data sets used for testing (Engstrom et al., 2004; Goldman et al., 2000; Koopman, 2005; Milinkovitch et al., 1996; Engstrom et al., 2004).

 

 

Only a proportion of these assumptions affect any one study, yet even one problem can contribute significantly to uncertainty in any molecular analysis. For instance, a sequence alignment that is only 0.95 correct may affect a branch arrangement of interest if wrong. If that is the case, then the probability of the branch arrangement of interest being right (determined by likelihood/Bayesian analysis) must be multiplied by the probability that the sequence is correct; the branch arrangement has then 0.95 times 0.95 or 0.90 probability of being correct.

 

The user taxonomist should determine, to the extent possible, which assumptions are relevant, and how robust to each assumption are the published results, i.e. that there is either no change in branch arrangements of interest or, if so, whether the change is at a probability high enough to make the published arrangement unreliable. Commonly, insufficient data is provided in the original paper to even begin to do this adequately. One can use a general correction factor as a way around this problem, such as 0.01 penalty on each confidence or credible interval.

 

BIBIOGRAPHY

Alfaro, M. E., S. Zoller & F. Lutzoni. 2003. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol. Biol. Evol. 20: 255--266

Ané, C. & M. J. Sanderson. 2005. Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories. Syst. Biol. 54: 145--157.

Avise, J. C. 1994. Molecular Markers, Natural History and Evolution. Chapman & Hall, New York.

Bapteste, E., E. Susko, J. Leigh, D. MacLeod, R. L. Charlebois & W. F. Doolittle. 2005. Do orthologous gene phylogenies really support tree-thinking? BMC Evol. Biol. 5: 33.

Barbulescu, M., G. Turner, Mei Su, R. Kim, M. I. Jensen-Seaman, A. S. Deinard, K. K. Kidd & J. Lenz. 2001. A HERV-K provirus in chimpanzees, bonobos and gorillas, but not humans. Current Biology 11: 779--783.

Benton, M. J. 1999. Early origins of modern birds and mammals: molecules vs. morphology. BioEssays 21: 1043--1051.

Bininda-Emonds, O. R. P., H. N. Bryant, and A. P. Russell. 1998. Supraspecific taxa as terminals in cladistic analysis: implicit assumptions of monophyly and a comparison of methods. Biol. J. Linnean Soc. 64: 101--133.

Bollback, J. P. 2002. Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. 19: 1171--1180.

Bridge, P. D., P. J. Roberts, B. M. Spooner & G. Panchal. 2003. On the unreliability of published DNA sequences. New Phytologist 160: 43--48.

Bromham, L. & M. Woolfit. 2004. Explosive radiations and the reliability of molecular clocks: Island endemic radiations as a test case. Syst. Biol. 53: 758--756.

Buckley, T. R. 2002. Model misspecification and probabilistic tests of topology: evidence from empirical data sets. Syst. Biol. 51: 509--523

Buckley, T. R., P. Arensburger, C. Simon & G. K. Chambers. 2002. Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. Syst. Biol. 51: 4--18.

Caporale, L. H. 2003. Natural selection and the emergence of a mutation phenotype: an update of the evolutionary synthesis considering mechanisms that affect genome variation. Ann. Rev. Microbiol. 57: 467--485.

Chen, Wei-Jen, C. Bonillo & G. Locointre. 2003. Repeatability of clades as a criterion of reliability: a case study for molecular phylogeny of Acanthomorpha (Teleostei) with larger number of taxa. Mol. Phylog. Evol. 26: 262--288.

Churchill, G. A., A. von Haeseler & W. C. Navidi. 1992. Sample size for a phylogenetic inference. Mol. Biol. Evol. 9: 753--769.

Christianson, M. L. 2005. Codon usage patterns distort phylogenies from or of DNA sequences. Amer. J. Bot. 92: 1221--1233.

Collard, M. & B. Wood. 2000. How reliable are human phylogenetic hypotheses? Proc. Natl. Acad. Sci. 97: 5003--5006.

Collin, R. & R. Cipriani. 2003. Dollo's law and the re-evolution of shell coiling. Proc. R. Soc. Lond. B 270: 2551--2555.

Cronn, R. C., R. L. Small, T. Haselkorn & J. F. Wendel. 2002. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Amer. J. Bot. 89: 707--725.  

Cummings, M. P., S. A. Handley, D. S. Myers, D. L. Reed, A. Rokas & K. Winka. 2003. Comparing bootstrap and posterior probability values in the four-taxon case. Syst. Biol. 477--487. 

Davis, C. C. & K. J. Wurdack. 2004. Host-to-parasite gene transfer in flowering plants: phylogenetic evidence from Malphighiales. Sciencexpress http://www.sciencexpress.org. 15 July 2004.

Des Marais, D. & B. D. Mishler. 2001. Phylogeography of the moss genus Timmiella (Pottiaceqae, Musci). Abstract. Botany 2002. http://www.botany2002.org/section12/abstracts/251.shtml

Doebly, J. & L. Lukens. 1998. Transcriptional regulators and the evolution of plant form. Plant Cell 10: 1075--1082.

Donoghue, M. J., & Ackerly, D. D. 1996. Phylogenetic uncertainty and sensitivity analyses in comparative biology. Phil. Trans. Roy. Soc. B, 351: 1241-1249.

Douady, C. J., F. Delsuc, Y. Boucher, W. F. Doolittle & E. J. Douzery. 2003. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20: 248--254.

Doyle, J. J.  1992.  Gene trees and species trees: Molecular systematics as one-character taxonomy.  Syst. Bot. 17: 144--163.

Doyle, J. J. 1993. DNA, phylogeny, and the flowering of plant systematics. BioScience 43: 380--389. 

Doyle, J. J., J. L. Doyle, J. T. Rauscher & A. H. D. Brown. 2004. Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytologist 161: 121--132.

Eernisse, D. J. & A.G. Kluge. 1993. Taxonomic congruence versus total evidence, and amniote phylogeny inferred from fossils, molecules, and morphology. Mol. Biol. Evol. 10: 1170--1079.

Engstrom, T. N., H. B. Shaffer & W. P. McCord. 2004. Multiple data sets, high homoplasy, and the phylogeny of softshell turtles. Syst. Biol. 53: 693--710.

Engstrom, T. N., H. B. Shaffer & W. P. McCord. 2004. Multiple data sets, high homoplasy, and the phylogeny of softshell turtles. Syst. Biol. 53: 693--710.

Eriksson, P., B. Svennblad, T. Britton & B. Oxelman. 2003. The reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst. Biol. 52: 665--673.

Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27: 401--410.

Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associcates, Inc., Sunderland, Massachusetts.

Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associcates, Inc., Sunderland, Massachusetts.

Funk, V. A., P. C. Hock, L. A. Prather & W. L. Wagner. 2005. The importance of vouchers. Taxon 54: 127--129.

Funk, V. A., R. Chan & S. C. Keeley. 2004. Insights into the evolution of the tribe Arctoteae (Compositae: subfamily Chichorideae s.s.) using trnL-F, ndhF, and ITS. Taxon 53: 637--655.

Funk, V. A., P. C. Hock, L. A. Prather & W. L. Wagner. 2005. The importance of vouchers. Taxon 54: 127--129.

Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36: 725--736.

Goldman, N., J. P. Anderson & A. G. Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Syst. Biol. 49: 652--670.

Graybeal, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47: 9--17.

Hall, B. K. 2003. Descent with modification: the unity underlying homology and homoplasy as seen through an analysis of development and evolution. Biol. Rev. 78: 409--433.

Hassanin, A., N. Léger & J. Deutsch. 2005. Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of Metazoa, and consequences for phylogenetic inferences. Syst. Biol. 54: 277--29

Hickson, R. E., C. Simon & S. W. Perry. 2000. The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence. Mol. Biol. Evol. 17: 530--539.

Ho, S. Y. W. & L. S. Jermiin. 2004: Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53: 623--637.

Holder, M. T., J. A. Anderson & A. K. Holloway. 2001. Difficulties in detecting hybridization. Syst. Biol. 50: 978--982.

Hollyoake, M., R. D. Campbell & B. Aguado. 2005. NKp30 (NCR3) is a pseudogene in 12 inbread and wild mouse strains, but an expressed gene in Mus caroli. Mol. Biol. Evol. 22: 1661--1672.

Hillis, D. M.  1991.  Discriminating between phylogenetic signal and random noise in DNA sequences.  Pp. 278--294 i M. M. Miyamoto & J. Cracraft (editors), Phylogenetic Analysis of DNA Sequences.  Oxford University Press, New York, N.Y.

Hudson, R. R. 1992. Gene trees, species trees and the segregation of ancestral alleles. Genetica 131: 509--512.

Huelsenbeck, J. P. & B. Rannala. 2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53: 904--913.

Huelsenbeck, J. P., D. L. Swofford, C. W. Cunningham, J. J. Bull & P. W. Waddell.  1994.  Is character weighting a panacea for the problem of data heterogeneity in phylogenetic analysis?  Syst. Biol. 43: 288--291.

Inagaki, Y., A. G. B. Simpson, J. B. Dacks & A. J. Roger. 2004. Phylogenetic artifacts can be caused by luecine, serine, and arginine codon usage heterogeneity: dinoflagellate plastid origins as a case study. Syst. Biol. 53: 582--593.

Jackson, A. P. 2005. The effect of paralogous lineages on the application of reconciliation analysis by cophylogeny mapping. Syst. Biol. 54: 127--145.

Jenner, R. A. 2004. Accepting partnership by submission? Morphological phylogenetics in a molecular millennium. Syst. Biol. 53: 333--342.

Johnson, L. A. & D. E. Soltis. 1998. Assessing congruence: empirical examples from molecular data. Pp. 297--348 in D. E. Soltis, P. S. Soltis & J. J. Doyle. (editors), Molecular Systematics of Plants II. DNA Sequencing. Kluwer Academic Publishers, Boston.

Kirchoff, B. K., S. J. Richter, D. L. Remington & E. Wisniewski. 2004. Complex data produce better characters. Syst. Biol. 1--17.

Kolaczkowski, B. & J. W. Thornton. 2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431: 980--984.

Koopman, W. J. M. 2005. Phylogenetic signal in AFLP data sets. Syst. Biol. 54: 197--217.

Lewis, P. O., M. T. Holder & K. E. Holsinger. 2005. Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54: 241--253.

Lipscomb, D., N. Platnick & Q. Wheeler. 2003. The intellectual content of taxonomy. Trends in Ecology and Evolution 18: 65--66.

Lockwood, C. A. & J. G. Fleagle. 1999. The recognition and evaluation of homoplasy in primate and human evolution. Amer. J. Physical Anthrop. 110: 189--232.

Lyons-Weiler, J. & M. C. Milinkovitch. 1997. A phylogenetic approach to the problem of differential lineage sorting. Mol. Biol. Evol. 14: 968--975.

Maddison, W. P. 1996. Molecular approaches and the growth of phylogenetic theory. Pp. 47--63 in J. D. Ferraris & S. R. Palumbi (editors), Molecular zoology: advances, strategies, and protocols. Wiley-Liss, New York

Maddison, W. P. & D. R. Maddison. 1992.  MacClade 3: Analysis of phylogeny and character evolution.  Sunderland, Massachusetts: Sinauer Associates.

Mason-Gamer, R. J. 2004. Reticulate evolution, introgression, and intertribal gene capture in an allohexaploid grass. Syst. Biol. 53: 25--37.

Milinkovitch, M. C., R. G. LeDuc, J. Adachi, F. Farnir, M. Georges & M. Hasegawa. 1996. Effects of character weighting and species sampling on phylogeny reconstruction: a case study based on DNA sequence data in cetaceans. Genetics 144: 1817--1833.

Mindell, D. P. & C. E. Thacker. 1996. Rates of molecular evolution: phylogenetic issues and applications. Annu. Rev. Ecol. Syst. 27: 279--303.

Naylor, G. J. P. & D. C. Adams. 2003 [2004]. Total evidence versus relevant evidence: A response to O'Leary et al. (2003). Syst. Biol. 52: 864--865.

Nei, M., I. B. Rogozin & H. Piontkvska. 2000. Purifying selection and birth-and-death evolution in the ubiquitin gene family. Proc. Nat. Acad. Sci. USA 97: 10866--10871.

Nei, M., S. Kumar & K. Takahashi. 1998. The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids is small. Proc. Nat’l Acad. Sci. USA 95: 12890--12397.

Nickrent, D. L., A. Blarer, Yin-Long Qiu, R. Vidal-Russell & F. E. Anderson. 2004. Phylogenetic inference in Rafflesiales: the influence of rate heterogeneity and horizontal gene transfer. BMD Evol. Biol. 4: 40.

Nixon, K. C. and J. M. Carpenter. 1996. On simultaneous analysis. Cladistics 12: 221--241.

Nylander, J. A., F. Ronquist, J. P. Huelsenbeck & J. L. Nieves-Aldrey. 2004. Bayesian phylogenetic analysis of combined data. Syst. Biol. 53: 47--67.

Olmstead, R. G. & R. W. Scotland. 2005. Molecular and morphological datasets. Taxon 54: 7--8

Page, R. D. 2004. On the dangers of aligning RNA sequences using “Conserved” motifs. Technical Reports in Taxonomy 00-01. Div. Environ. Evol. Biol., Inst. Biomed. Life Sci., Univ. Glascow. http://taxonomy.zoology.gla.ac.uk/publications/tech-reports/00-01.pdf.

Pagel, M. & A. Meade. 2004 A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53: 571--581. 

Pagel, M., A. Meade & D. Barker. 2004. Bayesian estimation of ancestral character states on phylogenies. Syst. Biol.  53 : 673--684.

Philippe, H., G. Lecointre, L. Hoc Lanh Vân Lê & H. Le Guyader. 1996. A critical study of homoplasy in molecular data with the use of a morphologically based cladogram, and its consequences for character weighting. Mol. Biol. Evol. 13: 1174--1186.

Pickett, K. M. & C. P. Randle. 2005. Strange Bayes indeed: uniform topological priors imply non-uniform clade priors. Mol. Phylog. Evol. 34: 203--211.

Pol, D. 2004. Empirical problems of the hierarchical likelihood ratio test for model selection. Syst. Biol. 53: 949--962.

Pons, J., T. G. Barraclough, K. Theodorides, A. Cardoso & A. P. Vogler. 2004. Using exon and intron sequences of the gene Mp20 to resolve basal relationships in Cicindela (Coleoptera: Cincindelidae). Syst. Biol. 53: 554--570.

Popp, M. & B. Oxelman. 2004. Evolution of a RNA polymerase gene family in Silene (Caryophyllaceae)---incomplete concerted evolution and topological congruence among paralogues. Syst. Biol. 53: 914--932.

Posada, D. & T. R. Buckley. 2004. Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53: 793--808.

Randle, C. P., M. E. Mort & D. J. Crawford. 2005. Bayesian inference of phylogenetics revisited: developments and concerns. Taxon 54: 9--15.

Rodriquez-Trelles, F., R. Tarrio & F. J. Ayala. 2004. Molecular clocks: whence and whither? Pp. 5--26 in P. C. J. Donoghue & M. P. Smith (editors), Telling the Evolutionary Time: Molecular Clocks and the Fossil Record. CRC Press, Boca Raton, Florida.

Rohwer, J. G. & B. Rudolph. 2005. Jumping genera: the phylogenetic postions of Cassytha, hypodaphnis, and Neocinnamomum (Lauraceae) based on different analyses of trnK intron sequences. Ann. Missouri Bot. Gard. 93: 153--178.

Rokas, A., N. King., J. Finnerty & S. B. Carroll. 2003. Conflicting phylogenetic signals at the base of the metazoan tree. Evol. Devel. 5: 346--359.

Ronquist, F. 2004. Bayesian inference of character evolution. Trends Ecol. Evol. 19: 475--481.

Ruedas, L. A., J. Salazar-Bravo, J. W. Dragoo & T. L. Yates. 2000. The importance of being earnest: What, if anything, constitutes a “specimen examined?” Mol. Phylog. Evol. 17: 129--132.

Sanderson, M. J. 1997. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14: 1218--1231.

Sang, T. & Y. Zhong. 2000. Testing hybridization hypotheses based on incongruent gene trees. Syst. Biol. 49: 422--434.

Sanjuán, R. & B. Wróbel. 2005. Weighted least-squares likelihood ratio test for branch testing in phylogenies reconstructed from distance measures. Syst. Biol. 54: 218--229.

Scotland, R. W., R. G. Olmstead & J. R. Bennett. 2003. Phylogeny reconstruction: The role of morphology. Syst. Biol. 52: 539--548.

Shaw, K. L. 2002. Conflict between nuclear and mitochondrial DNA phylogenies of a recenty species radiation: What mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proc. Nat. Acad. Sci. USA 99: 16122--16127.

Simmons, M. P. & H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. 49: 369--381.

Sites, J. W., Jr., S. K. Davis, T. Guerra, J. B. Iverson & H. L. Snell. 1996. Character congruence and phylo­genetic signal in molecular and morphological data sets: a case study in the living iguanas (Squamata, Iguanidae). Mol. Biol. Evol. 13: 1087–1105.

Smith, N. D. & A. H. Turner. 2005. Morphology's role in phylogeny reconstruction: perspectives from paleontology. Syst. Biol. 54: 166--173.

Sober, E. 2004. The contest between parsimony and likelihood. Syst. Biol. 53: 644--653.

Steppan, S. J., R. M. Adkins & J. Anderson. 2004. Phylogeny and divergence-date estimates of rapid radiations in muroid rodents based on multiple nuclear genes. Syst. Biol. 53: 533-553.

Sullivan, J. & D. L. Swofford. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. . J. Mammal. Evol. 4: 77--86.

Swofford, D. L. 1998. PAUP*. Phylogenetic Analysis Using Parsimony ( and Other Methods). Ver. 4. Sinauer Associates, Sunderland, Massachusetts.

Templeton, A. 1986. Relation of humans to African apes: A statistical appraisal of diverse types of data. Pp. 365--388 in S. Karlin & E. Nevo (editors), Evolutionary Processes and Theory. Academic Press, New York.

Tuffley, C. & M. Steel. 1997. Moldeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147: 63--91.

van Oppen, M. J. H., B. J. McDonald, B. Willis & D. J. Miller. 2001. The evolutionary history of the coral genus Acropora (Scleratinia, Cnidiaria) based on a mitochondrial and a nuclear marker: reticulation, incomplete lineage sorting, or morphological convergence? Mol. Biol. Evol. 18: 1315--1329.

Vilgalys, R. 2003. Taxonomic misidentification in public DNA databases. New Phytol. 160: 4--5.

Wendel, J. F. & J. J. Doyle. 1998. Phylogenetic incongruence: window into genome history and molecular evolution. Pp. 265--296 in D. E. Soltis, P. S. Soltis & J. J. Doyle (editors), Molecular Systematics of Plants II: DNA sequencing. Kluwer Academic Publishers, Boston.

Wheeler, W. C. 1994. Sources of ambiguity in nucleic acid sequence alignment. Pp. 323--352 in B. Schierwater, B. Striet, G. P. Wagner & R. DeSalle (editors), Molecular Ecology and Evolution. Birkhäuser Verlag, Basel.

Wheeler, W. C. 1999. Fixed character states and the optimization of molecular sequence data. Cladistics 15: 379--385.

Wilcox, T. P., D. J. Zwickl, T. Heath & D. M. Hillis. 2002. Phylogenetic relationship of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phylog. Evol. 25: 361--371.

Winkler, R. L. 1972. An Introducation to Bayesian Inference and Decision. Holt, Rinehart and Winston, Inc., New York.

Wolfe, A. D. & C. P. Randle. 2004. Recombination, heteroplasmy, haplotype polymorphism, and paralogy in plastid genes: implications for plant molecular systematics. Syst. Bot. 29: 1011--1020.

Zang, Jianzhi & S. Kumar. 1997. Detection of convergent and parallel evolution at the amino acid sequence level. Mol. Biol. Evol. 14: 527--536.

 

[This is an extract with some modification of a larger paper just submitted.]