|
DECONSTRUCTING RECONSTRUCTION:
MAIN POINTS AND ADDITIONAL COMMENTS
Bryology Seminar, Missouri
Botanical Garden
Richard H. Zander, 19 Feb. 1999
SUMMARY: Selecting one phylogenetic hypothesis of several or many
reasonable alternatives as "best" and presenting it as a
reconstruction cannot provide a probabilistic or dependable basis for action.
This is because there may be one or more reasonable alternative hypotheses,
or, the sum of the probabilities of all reasonable alternative hypotheses may
be large (Zander 1998).
- PHILOSOPHY:
- According to
postmodern philosophy, reality is granulated. This is like having
multiple Kuhnian paradigms scattered through the world where all
thought and action is justified by unique arguments that are
essentially cultural. When society is asked for financial support,
however, societal common sense and practicality are required.
- Reconstruction of a
historical event means having no reasonable alternative hypothesis also
supported by the data. This common sense guide for action is ingrained
in our judicial system: act only when there is no reasonable doubt. In
phylogenetics this has been provided by high Bremer support and high
posterior probability. One must check, however, for differential gene
lineage sorting.
- A phylogenetic
analysis is valuable only to the extent it is better than previous
analyses. Very well supported phylogenetic reconstructions (given the
assumptions) further predictive classification and guide biogeography,
but reconstructions that have reasonable alternative hypotheses cannot
(to the extent that the alternatives are contradictory).
- With the
mathematization of systematics, the quasi-religious wars in philosophy
and statistics must claim one’s attention (Gigerenzer et al. 1989).
Phylogeny, because it cannot be directly tested, is attractive to
devotees of dubious methodologies.
- Justification for
optimality criteria in retrodiction involves adoption of essentialism
(= realism or idealism) over nominalism (= antirealism or
verificationism) (Hendry 1998). Essentialism (Popper 1957) encourages reliance
on optimality alone to determine the tree (or on support based on
optimality), the one best explanation being considered sufficient. But
"approximating" or "converging on" the true tree
logically includes all reasonable trees supported by the same data set
(i.e., as a confidence or credible interval).
- Philosophers often
give criteria for "selecting," "choosing" or
"accepting" particular hypotheses, usually based on
"maximum explanatory power" (e.g. Salmon 1971) but they
seldom discuss possible losses upon action.
- One requires data
that is both sufficient and unambiguous. Sufficient means achieving
resolution of a tree in an analysis. Unambiguous means that the
resolution is unique, not contradicted by a reasonable alternative.
- METHODS:
- Optimality analysis
involves minimum falsifiability, maximum likelihood and maximum
posterior probability. It should stop with a pool of trees that are all
reasonable in the light of evolution, but usually only one optimal or a
set of equally optimal trees is offered. Although the homogeneous
reference class is certainly made smaller by using optimality criteria,
a second criterion or support measure is needed to identify one
hypothesis sufficiently dependable to act upon.
- Parsimony and
statistical approaches are more impressive than clustering techniques
in that they model (albeit simplistically) evolution.
- The regularity
assumptions made for parsimony and statistical studies (e.g., homology,
monothetic taxonomy, correct alignment, uninformative prior weighting,
independent and uniform distributions, stationarity of the process,
correct evolutionary models, absence of lineage sorting) are
problematic (Avise 1994) but are arguable among scientists, and
progress is possible with further study.
- Evaluation of
results requires some measure of degree of support. It may be that we
have insufficient empirical knowledge of evolution (e.g. prevalence of
character convergence) to independently gauge adequate support for
present evaluations of the phylogeny of problematic groups.
- There are two
sources of data: morphology and molecular, three major methods of
analysis: parsimony, likelihood and Bayesian, and two much used but
problematic measures of support: Bremer support (= decay index) and
posterior probability.
- Published reconstructions
with good support have been rare or unilluminating because good support
is usually restricted to small subclades, and these often match
"uncontested groups."
- For new progress in
phylogeny and classification, one might ask for subclades to be as well
supported as any "uncontested groups" (sensu Milinkovitch et
al. 1996) also appearing in a cladogram (assuming such groups have good
support).
- PARSIMONY:
- Nature is
parsimonious, but not optimally so. A corollary to Occam's Razor
pertaining especially to historical reconstruction is that explanations
must remain multiple when no one of them is probabilistically adequate.
- Parsimony algorithms
used with morphological and molecular data sets eliminate unreasonable
trees as being those with too much character convergence in light of
evolution, while Bayesian calculations (and perhaps maximum likelihood)
with molecular data sets do the same with unreasonably improbable
trees.
- Parsimony is used
for both morphological and molecular data sets, and Bayesian analysis
and maximum likelihood is only used for molecular data sets.
- CORROBORATION is
increasing support for a particular tree, or at least maintaining a
very high probability of a tree with additional data. CONGRUENCE is two
or more data sets with the same level of support both for and against a
hypothesis. CONSILIENCE is congruence of data sets produced with
somewhat different natural processes, such as morphology and molecular
analysis. Q: If two consilient data sets produce the same shortest
tree, even though that one tree is poorly supported, surely that one
tree cannot be rejected as random variation or sampling error? A:
Congruence supports all reasonable trees. The same two different data
sets can together support two or more different hypotheses, and these
can be totally contradictory. Bootstrapping (and other subsampling)
ducks the same problem: only the shortest trees are involved in
bootstrap analysis.
- STATISTICAL
PHYLOGENETICS:
- Maximum likelihood
gives the correct solution with plentiful data as long as the patterns
of nucleotide substitution are the same in the data as in the model
used. The actual pattern is complex and apparently changes with time
because different species apparently have different nucleotide
frequencies and codon usages (Nei et al. 1998). Likelihood ratio tests
cannot be used in maximum likelihood analysis to compare different
trees, because the trees are already optimized (Nei, 1987; Yang, 1996).
- Markov chain Monte
Carlo analysis simulates a chain of trees where the long run relative
frequency of hitting any particular topology is proportional to its
marginal posterior probability (Mau et al. 1997). Various models of
molecular evolution are used. Although the results are presented as
summing to "probability 1," these are relative probabilities
in that all probabilities too small to be calculated with modern
computers are ignored, but may add to a large fraction of the total
probability (Yang & Rannala 1997).
- The probability of
Type I errors (accepting a false phylogenetic hypothesis as true) can
be lessened by making the credible region larger (say, decay of 3
instead of 2 in morphological studies or 99% instead of 95% in
molecular analysis) which, however, increases Type II errors, or by
increasing data. There is, however, twice the chance of a Type I error
for any internal branch as a Type II error, given that only one of the
results of nearest neighbor interchange is correct.
- CONCLUSION: A
Sustainable Society requires an extensive living library of genetic
variants for biotechnological development of future renewable resources
in energy, food, medicine and materials. We ignore fundamental
systematic study of biological diversity at our peril. Phylogenetic
analysis, either by parsimony or statistics, appears to have greater
potential than previous methods in aiding predictive classification,
biogeographic study and similar analyses, but only if a second criterion
of "no reasonable alternative hypothesis" (given the
assumptions) is added to that of optimality.
SOME
PERTINENT LITERATURE
Avise, J. C. 1994. Molecular Markers,
Natural History and Evolution. Chapman & Hall, N.Y.
Baum, D. A., R. L. Small & J. F.
Wendel. 1998. Biogeography and floral evolution of Baobabs (Adansonia,
Bombacaceae) as inferred from multiple data sets. Syst. Biol. 47: 181-207.
Bremer, K. 1988. The limits of amino acid
sequence data in angiosperm phylogenetic reconstruction. Evolution 42:
796-803.
Bernardo, J. M. & A. F. M. Smith. 1994.
Bayesian Theory. John Wiley & Sons, New York.
Doyle, J. J. 1992. Gene trees and species
trees: molecular systematics as one-character taxonomy. Syst. Bot. 17:
144-163.
Edwards, A.W.F. 1972. Likelihood. Cambridge
Univ. Press, Cambridge.
Felsenstein, J. & E. Sober. 1986.
Parsimony and likelihood: an exchange. Syst. Zool. 35: 617-626.
Games, P. A. & G. R. Klare. 1967.
Elementary Statistics: Data Analysis for the Behavioral Sciences.
McGraw-Hill, New York.
Hemple, C. G. 1965. Aspects of Scientific
Explanation. Free Press, New York.
Hendry, R. 1998. Scientific Realism and
Scientific Antirealism. Dec. 18, 1998.
http://www.dur.ac.uk/~dfl0www/modules/philsci/H-OUT12.HTM
Gigerenzer, G., Z. Swijtink, T. Porter, L.
Daston, J. Beatty & L. Krüger. 1989. The Empire of Chance. Cambridge
Univ. Press, Cambridge.
Goldman, N. 1990. Maximum likelihood
inference of phylogenetic trees, with special reference to a Poisson Process
Model of DNA substitution and to parsimony analyses. Syst. Zool. 39: 345-361.
Kluge, A. G. 1997. Testability and the
refutation and corroboration of cladistic hypotheses. Cladistics 13: 81-96.
Knox, E. B. & J. D. Palmer. 1998.
Chloroplast DNA evidence on the origin and radiation of the giant lobelias in
eastern Africa. Syst. Bot. 23: 109-149.
Kuhn, T. S. The Structure of Scientific
Revolutions. 2nd Ed. Univ. Chicago Press, Chicago.
Lapointe, F.-J. & P. Legendre. 1991.
The generation of random ultrametric matrices representing dendrograms. J.
Classification 8: 177-200.
Lyons-Weiler, J., G. A. Hoelzer & R. J.
Tausch. 1996. Relative apparent synapomorphy analysis (RASA) I: The
statistical measurement of phylogenetic signal. Mol. Biol. Evol. 13: 749-757.
Lyons-Weiler, J. & M. C. Milinkovitch.
1997. A phylogenetic approach to the problem of differential lineage sorting.
Mol. Biol. Evol. 14: 968-975.
Mau, B., M. A. Newton
& B. Larget. 1997. Bayesian phylogenetic inference via Markov chain Monte
Carlo methods. Mol. Biol. Evol. 14: 717-724.
Milinkovitch, M. C., R. G. LeDuc, J.
Adachi, F. Farnir, M. Georges & M. Hasegawa. 1996. Effects of character
weighting and species sampling on phylogeny reconstruction: a case study
based on DNA sequence data in cetaceans. Genetics 144: 1817–1833.
Nei, M. 1987. Molecular evolutionary
genetics. Columbia University Press, New York.
Nei, M., S. Kumar & K. Takahashi. 1998.
The optimization principle in phylogenetic analysis tends to give incorrect
topologies when the number of nucleotides or amino acids used is small. Proc.
Nat. Acad. Sci. 95: 12890-12397.
Oxelman, B, M. Backlund & B. Bremer.
1999. Relationships of the Buddlejaceae s. l. investigated using parsimony,
jackknife and branch support analysis of chloroplast ndhF and rbcL sequence
data. Syst. Bot. 24: 164–182.
Pamilo, P. & M. Nei. 1988.
Relationships between gene trees and species trees. Molecular Biology and
Evolution 5: 568-583.
Pap, A. 1962. An Introduction to the
Philosophy of Science. Macmillan Co., N.Y.
Popper, K. R. 1957. The Poverty of
Historicism. Harper Torchbooks, Harper & Row, N.Y.
Popper, K. R. 1962. Conjectures and
Refutations: The Growth of Scientific Knowledge. Harper Torchbooks, Harper
& Row, New York. 1965 Edition.
Rannala, B. & Z. Yang. 1996.
Probability distribution of molecular evolutionary trees: a new method of
phylogenetic inference. J. Mol. Evol. 43: 304-311.
Rice, K. A., M. J. Donoghue & R. G. Olmstead.
1997. Analyzing large data sets: rbcL 500 revisited. Syst. Biol.
46: 554–563.
Salmon, W. C. 1971. Statistical Explanation
and Statistical Relevance. Univ. Pittsburgh Press, Pittsburgh.
Sanderson, M. J. 1995. Objections to
bootstrapping phylogenies: a critique. Syst. Biol. 44(3):299-320.
Sneath, P. H. A. & R. R. Sokal. 1973.
Numerical Taxonomy: the Principles and Practice of Numerical Classification.
W. H. Freeman & Co., San Francisco.
Sober, E. 1975. Simplicity. Clarendon
Press, Oxford.
Swensen, S. M., J. N. Luthi & L. H.
Rieseberg. 1998. Datiscaceae revisited: monophyly and the sequence of
breeding system evolution. Syst. Bot. 23: 157-169.
Swofford, D. L. & J. Olsen. 1990.
Phylogenetic reconstruction. Pp. 411-501, In: D. M. Hillis and C. Moritz,
eds., Molecular Systematics. Sinauer Associates, Sunderland, Massachusetts.
Wiley, E. O., D. Siegel-Causey, D. R.
Brooks & V. A. Funk. 1991. The compleat cladist: a primer of phylogenetic
procedures. University of Kansas Museum of Natural History, Special
Publication 19.
Wiley, E. O. 1981. Phylogenetics: The
Theory and Practice of Phylogenetic Systematics. John Wiley and Sons, New
York.
Wittgenstein, L. 1961. Tractatus
Logico-Philosphicus. Transl. by D. F. Pears & B. F. McGuiness. Routledge
& Kegan Paul, Ltd., London. (Paperback ed. 1974. Humanities Press,
Atlantic Highlands, N.J.)
Yang Z. 1996. Maximum-likelihood models for
combined analyses of multiple sequence data. J. Mol. Evol. 42: 587-596.
Yang, Z. 1997. Phylogenetic Analysis by
Maximum Likelihood (PAML). Ver. 1.3. Dept. of Integrative Biology, University
of California at Berkley.
Yang, Z. & B. Rannala. 1997. Bayesian
phylogenetic inference using DNA sequences: a Markov chain Monte Carlo
method. Mol. Biol. Evol. 14: 717-724.
Yee, M. S. Y. 2000. Tree robustness and
clade significance. Syst. Biol. 49: 829–836.
Zander , R. H. 1998. Phylogenetic
reconstruction, a critique. Taxon 47: 681-693.
Zander, R. H. 2001. A conditional probability
of reconstruction measure for internal cladogram branches. Syst. Biol. 50(3):
in press. [Note: reprint now
available.]
|