DECONSTRUCTING RECONSTRUCTION:
MAIN POINTS AND ADDITIONAL COMMENTS
Richard H. Zander
Buffalo Museum
of Science
1020 Humboldt Pkwy
Buffalo, NY 14211

February 12, 2001

Return to Talk

DECONSTRUCTING RECONSTRUCTION:
MAIN POINTS AND ADDITIONAL COMMENTS

Bryology Seminar, Missouri Botanical Garden
Richard H. Zander, 19 Feb. 1999

SUMMARY: Selecting one phylogenetic hypothesis of several or many reasonable alternatives as "best" and presenting it as a reconstruction cannot provide a probabilistic or dependable basis for action. This is because there may be one or more reasonable alternative hypotheses, or, the sum of the probabilities of all reasonable alternative hypotheses may be large (Zander 1998).

  1. PHILOSOPHY:
    1. According to postmodern philosophy, reality is granulated. This is like having multiple Kuhnian paradigms scattered through the world where all thought and action is justified by unique arguments that are essentially cultural. When society is asked for financial support, however, societal common sense and practicality are required.
    2. Reconstruction of a historical event means having no reasonable alternative hypothesis also supported by the data. This common sense guide for action is ingrained in our judicial system: act only when there is no reasonable doubt. In phylogenetics this has been provided by high Bremer support and high posterior probability. One must check, however, for differential gene lineage sorting.
    3. A phylogenetic analysis is valuable only to the extent it is better than previous analyses. Very well supported phylogenetic reconstructions (given the assumptions) further predictive classification and guide biogeography, but reconstructions that have reasonable alternative hypotheses cannot (to the extent that the alternatives are contradictory).
    4. With the mathematization of systematics, the quasi-religious wars in philosophy and statistics must claim one’s attention (Gigerenzer et al. 1989). Phylogeny, because it cannot be directly tested, is attractive to devotees of dubious methodologies.
    5. Justification for optimality criteria in retrodiction involves adoption of essentialism (= realism or idealism) over nominalism (= antirealism or verificationism) (Hendry 1998). Essentialism (Popper 1957) encourages reliance on optimality alone to determine the tree (or on support based on optimality), the one best explanation being considered sufficient. But "approximating" or "converging on" the true tree logically includes all reasonable trees supported by the same data set (i.e., as a confidence or credible interval).
    6. Philosophers often give criteria for "selecting," "choosing" or "accepting" particular hypotheses, usually based on "maximum explanatory power" (e.g. Salmon 1971) but they seldom discuss possible losses upon action. [Note on April 20, 2009, see Robbins 1968 for a nice refutation of Salmon’s 1967 Foundations of Scientific Inference. Univ. Pittsburgh Press.]
    7. One requires data that is both sufficient and unambiguous. Sufficient means achieving resolution of a tree in an analysis. Unambiguous means that the resolution is unique, not contradicted by a reasonable alternative.
  2. METHODS:
    1. Optimality analysis involves minimum falsifiability, maximum likelihood and maximum posterior probability. It should stop with a pool of trees that are all reasonable in the light of evolution, but usually only one optimal or a set of equally optimal trees is offered. Although the homogeneous reference class is certainly made smaller by using optimality criteria, a second criterion or support measure is needed to identify one hypothesis sufficiently dependable to act upon.
    2. Parsimony and statistical approaches are more impressive than clustering techniques in that they model (albeit simplistically) evolution.
    3. The regularity assumptions made for parsimony and statistical studies (e.g., homology, monothetic taxonomy, correct alignment, uninformative prior weighting, independent and uniform distributions, stationarity of the process, correct evolutionary models, absence of lineage sorting) are problematic (Avise 1994) but are arguable among scientists, and progress is possible with further study.
    4. Evaluation of results requires some measure of degree of support. It may be that we have insufficient empirical knowledge of evolution (e.g. prevalence of character convergence) to independently gauge adequate support for present evaluations of the phylogeny of problematic groups.
    5. There are two sources of data: morphology and molecular, three major methods of analysis: parsimony, likelihood and Bayesian, and two much used but problematic measures of support: Bremer support (= decay index) and posterior probability.
    6. Published reconstructions with good support have been rare or unilluminating because good support is usually restricted to small subclades, and these often match "uncontested groups."
    7. For new progress in phylogeny and classification, one might ask for subclades to be as well supported as any "uncontested groups" (sensu Milinkovitch et al. 1996) also appearing in a cladogram (assuming such groups have good support).
  3. PARSIMONY:
    1. Nature is parsimonious, but not optimally so. A corollary to Occam's Razor pertaining especially to historical reconstruction is that explanations must remain multiple when no one of them is probabilistically adequate.
    2. Parsimony algorithms used with morphological and molecular data sets eliminate unreasonable trees as being those with too much character convergence in light of evolution, while Bayesian calculations (and perhaps maximum likelihood) with molecular data sets do the same with unreasonably improbable trees.
    3. Parsimony is used for both morphological and molecular data sets, and Bayesian analysis and maximum likelihood is only used for molecular data sets.
    4. CORROBORATION is increasing support for a particular tree, or at least maintaining a very high probability of a tree with additional data. CONGRUENCE is two or more data sets with the same level of support both for and against a hypothesis. CONSILIENCE is congruence of data sets produced with somewhat different natural processes, such as morphology and molecular analysis. Q: If two consilient data sets produce the same shortest tree, even though that one tree is poorly supported, surely that one tree cannot be rejected as random variation or sampling error? A: Congruence supports all reasonable trees. The same two different data sets can together support two or more different hypotheses, and these can be totally contradictory. Bootstrapping (and other subsampling) ducks the same problem: only the shortest trees are involved in bootstrap analysis.
  4. STATISTICAL PHYLOGENETICS:
    1. Maximum likelihood gives the correct solution with plentiful data as long as the patterns of nucleotide substitution are the same in the data as in the model used. The actual pattern is complex and apparently changes with time because different species apparently have different nucleotide frequencies and codon usages (Nei et al. 1998). Likelihood ratio tests cannot be used in maximum likelihood analysis to compare different trees, because the trees are already optimized (Nei, 1987; Yang, 1996).
    2. Markov chain Monte Carlo analysis simulates a chain of trees where the long run relative frequency of hitting any particular topology is proportional to its marginal posterior probability (Mau et al. 1997). Various models of molecular evolution are used. Although the results are presented as summing to "probability 1," these are relative probabilities in that all probabilities too small to be calculated with modern computers are ignored, but may add to a large fraction of the total probability (Yang & Rannala 1997).
    3. The probability of Type I errors (accepting a false phylogenetic hypothesis as true) can be lessened by making the credible region larger (say, decay of 3 instead of 2 in morphological studies or 99% instead of 95% in molecular analysis) which, however, increases Type II errors, or by increasing data. There is, however, twice the chance of a Type I error for any internal branch as a Type II error, given that only one of the results of nearest neighbor interchange is correct.
  5. CONCLUSION: A Sustainable Society requires an extensive living library of genetic variants for biotechnological development of future renewable resources in energy, food, medicine and materials. We ignore fundamental systematic study of biological diversity at our peril. Phylogenetic analysis, either by parsimony or statistics, appears to have greater potential than previous methods in aiding predictive classification, biogeographic study and similar analyses, but only if a second criterion of "no reasonable alternative hypothesis" (given the assumptions) is added to that of optimality.

SOME PERTINENT LITERATURE

Avise, J. C. 1994. Molecular Markers, Natural History and Evolution. Chapman & Hall, N.Y.

Baum, D. A., R. L. Small & J. F. Wendel. 1998. Biogeography and floral evolution of Baobabs (Adansonia, Bombacaceae) as inferred from multiple data sets. Syst. Biol. 47: 181-207.

Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: 796-803.

Bernardo, J. M. & A. F. M. Smith. 1994. Bayesian Theory. John Wiley & Sons, New York.

Doyle, J. J. 1992. Gene trees and species trees: molecular systematics as one-character taxonomy. Syst. Bot. 17: 144-163.

Edwards, A.W.F. 1972. Likelihood. Cambridge Univ. Press, Cambridge.

Felsenstein, J. & E. Sober. 1986. Parsimony and likelihood: an exchange. Syst. Zool. 35: 617-626.

Games, P. A. & G. R. Klare. 1967. Elementary Statistics: Data Analysis for the Behavioral Sciences. McGraw-Hill, New York.

Hemple, C. G. 1965. Aspects of Scientific Explanation. Free Press, New York.

Hendry, R. 1998. Scientific Realism and Scientific Antirealism. Dec. 18, 1998. http://www.dur.ac.uk/~dfl0www/modules/philsci/H-OUT12.HTM

Gigerenzer, G., Z. Swijtink, T. Porter, L. Daston, J. Beatty & L. Krüger. 1989. The Empire of Chance. Cambridge Univ. Press, Cambridge.

Goldman, N. 1990. Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson Process Model of DNA substitution and to parsimony analyses. Syst. Zool. 39: 345-361.

Kluge, A. G. 1997. Testability and the refutation and corroboration of cladistic hypotheses. Cladistics 13: 81-96.

Knox, E. B. & J. D. Palmer. 1998. Chloroplast DNA evidence on the origin and radiation of the giant lobelias in eastern Africa. Syst. Bot. 23: 109-149.

Kuhn, T. S. The Structure of Scientific Revolutions. 2nd Ed. Univ. Chicago Press, Chicago.

Lapointe, F.-J. & P. Legendre. 1991. The generation of random ultrametric matrices representing dendrograms. J. Classification 8: 177-200.

Lyons-Weiler, J., G. A. Hoelzer & R. J. Tausch. 1996. Relative apparent synapomorphy analysis (RASA) I: The statistical measurement of phylogenetic signal. Mol. Biol. Evol. 13: 749-757.

Lyons-Weiler, J. & M. C. Milinkovitch. 1997. A phylogenetic approach to the problem of differential lineage sorting. Mol. Biol. Evol. 14: 968-975.

Mau, B., M. A. Newton & B. Larget. 1997. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Mol. Biol. Evol. 14: 717-724.

Milinkovitch, M. C., R. G. LeDuc, J. Adachi, F. Farnir, M. Georges & M. Hasegawa. 1996. Effects of character weighting and species sampling on phylogeny reconstruction: a case study based on DNA sequence data in cetaceans. Genetics 144: 1817–1833.

Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.

Nei, M., S. Kumar & K. Takahashi. 1998. The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc. Nat. Acad. Sci. 95: 12890-12397.

Oxelman, B, M. Backlund & B. Bremer. 1999. Relationships of the Buddlejaceae s. l. investigated using parsimony, jackknife and branch support analysis of chloroplast ndhF and rbcL sequence data. Syst. Bot. 24: 164–182.

Pamilo, P. & M. Nei. 1988. Relationships between gene trees and species trees. Molecular Biology and Evolution 5: 568-583.

Pap, A. 1962. An Introduction to the Philosophy of Science. Macmillan Co., N.Y.

Popper, K. R. 1957. The Poverty of Historicism. Harper Torchbooks, Harper & Row, N.Y.

Popper, K. R. 1962. Conjectures and Refutations: The Growth of Scientific Knowledge. Harper Torchbooks, Harper & Row, New York. 1965 Edition.

Rannala, B. & Z. Yang. 1996. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43: 304-311.

Rice, K. A., M. J. Donoghue & R. G. Olmstead. 1997. Analyzing large data sets: rbcL 500 revisited. Syst. Biol. 46: 554–563.

Salmon, W. C. 1971. Statistical Explanation and Statistical Relevance. Univ. Pittsburgh Press, Pittsburgh.

Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Syst. Biol. 44(3):299-320.

Sneath, P. H. A. & R. R. Sokal. 1973. Numerical Taxonomy: the Principles and Practice of Numerical Classification. W. H. Freeman & Co., San Francisco.

Sober, E. 1975. Simplicity. Clarendon Press, Oxford.

Swensen, S. M., J. N. Luthi & L. H. Rieseberg. 1998. Datiscaceae revisited: monophyly and the sequence of breeding system evolution. Syst. Bot. 23: 157-169.

Swofford, D. L. & J. Olsen. 1990. Phylogenetic reconstruction. Pp. 411-501, In: D. M. Hillis and C. Moritz, eds., Molecular Systematics. Sinauer Associates, Sunderland, Massachusetts.

Wiley, E. O., D. Siegel-Causey, D. R. Brooks & V. A. Funk. 1991. The compleat cladist: a primer of phylogenetic procedures. University of Kansas Museum of Natural History, Special Publication 19.

Wiley, E. O. 1981. Phylogenetics: The Theory and Practice of Phylogenetic Systematics. John Wiley and Sons, New York.

Wittgenstein, L. 1961. Tractatus Logico-Philosphicus. Transl. by D. F. Pears & B. F. McGuiness. Routledge & Kegan Paul, Ltd., London. (Paperback ed. 1974. Humanities Press, Atlantic Highlands, N.J.)

Yang Z. 1996. Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42: 587-596.

Yang, Z. 1997. Phylogenetic Analysis by Maximum Likelihood (PAML). Ver. 1.3. Dept. of Integrative Biology, University of California at Berkley.

Yang, Z. & B. Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14: 717-724.

Yee, M. S. Y. 2000. Tree robustness and clade significance. Syst. Biol. 49: 829–836.

Zander , R. H. 1998. Phylogenetic reconstruction, a critique. Taxon 47: 681-693.

Zander, R. H. 2001. A conditional probability of reconstruction measure for internal cladogram branches. Syst. Biol. 50(3): in press. [Note: reprint now available.]

 

 

 

<script type="text/javascript">

var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");

document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));

</script>

<script type="text/javascript">

try {

var pageTracker = _gat._getTracker("UA-3783322-4");

pageTracker._trackPageview();

} catch(err) {}</script>