Minimal Values for
Reliability of Bootstrap and Jackknife Proportions, Decay Index, and Bayesian
Posterior Probability: Supplement to Poster Session at Molecular Systematics of
Bryophytes: Progress, Problems and Perspectives, MBG, September 6, 2003
Richard H. Zander
Missouri
Botanical Garden, PO Box 299, St. Louis, MO 63166--0299 USA; September 6, 2003
Nonparametric
bootstrap and jackknife proportions (BP and JK), the Decay Index (DI), and
Bayesian posterior probabilities (BPP) were obtained from artificial 4-taxon
data sets predetermined to have .95 confidence limits through an exact binomial
calculation. The binomial confidence interval (CI) is 1 minus the chance of the
data occurring randomly, the null being a star, and the alternative hypothesis
is shared ancestry as a explanation of the optimal branch arrangement. AB is
the support in steps for the tree ((AB)C,D), which is here always the optimal
tree), AC is the support for ((AC)B,D), and BC for ((BC)A, D). Once we have the
optimal tree, the number of steps represented by AC and BC is assumed generated
randomly by parallelism (only one arrangement can be supported by shared
ancestry). The common measures of reliability all respond to variation in AC:BC
ratio, with higher ratios yielding lower reliability values, but the binomial
CI does not vary with AC:BC ratio. Therefore, only a nearly equal ratio of
AC:BC will unambiguously signal the minimum BP, JK, DI and BPP values that
correspond to a .95 binomial CI.
Table
of optimal branch lengths and the minimum values of some common reliability
measures needed to unambiguously attain a binomial confidence interval of .95.
Interpolation may be needed.
|
Length
of |
Max.
AB:AC:BC Needed
for .95 CI |
Min.
BP |
Min.
JP |
Min.
DI |
Min.
BPP. |
|||||
|
3 |
03:00:00 |
1.00 |
1.00 |
3 |
.99 |
|||||
|
4 |
04:01:00 |
.95 |
.79 |
3 |
.99 |
|||||
|
5 |
05:01:01 |
.95 |
.90 |
4 |
1.00 |
|||||
|
10 |
10:04:04 |
.91 |
.92 |
6 |
1.00 |
|||||
|
15 |
15:08:07 |
.91 |
.91 |
7 |
.99 |
|||||
|
20 |
20:12:11 |
.89 |
.89 |
8 |
.98 |
|||||
|
25 |
25:16:15 |
.89 |
.89 |
9 |
.98 |
|||||
|
30 |
30:20:19 |
.89 |
.89 |
10 |
.97 |
|||||
|
35 |
35:24:23 |
.88 |
.89 |
11 |
.95 |
|||||
|
40 |
40:28:27 |
.89 |
.89 |
12 |
.96 |
|||||
|
45 |
45:32:32 |
.87 |
.88 |
13 |
.95 |
|||||
|
50 |
50:37:36 |
.87 |
.87 |
13 |
.93 |
|||||
|
55 |
55:41:40 |
.87 |
.87 |
14 |
.92 |
|||||
|
60 |
60:45:45 |
.88 |
.88 |
15 |
.91 |
|||||
Thus,
if you know the both the branch length and reliability value for
internodes, you can gauge the binominal CI. If any weighting of steps genuinely
reflects expected likelihood of individual evolutionary events, then this
should work for molecularly based cladograms.
Addendum:
For those interested in binomial CI's of .90 and .99, tables for BP and BPP are
given below. The local DI's are easy to calculate as the difference between AB
and AC.
90% binomial CI
Len. AB:AC:BC BP BPP
05 05:02:01 .87 .98
10 10:05:05 .85 .98
15 15:09:09 .82 .96
20 20:13:13 .82 .95
25 25:17:17 .82 .94
30 30:22:21 .82 .94
35 35:26:25 .80 .91
40 40:30:30 .80 .90
45 45:35:34 .80 .88
50 50:39:39 .79 .86
55 55:44:43 .80 .84
60 60:48:48 .79 .83
99% binomial CI
Len. AB:AC:BC BP BPP
05 n.a . n.a 1.00
10 10:03:02 .98 1.00
15 15:06:05 1.00 1.00
20 20:09:09 .97 1.00
25 25:13:12 .97 1.00
30 30:16:16 .97 1.00
35 35:20:20 .96 1.00
40 40:24:24 .97 .99
45 45:28:28 .95 .99
50 50:32:32 .96 .99
55 55:36:36 .96 .99
60 60:41:40 .95 .98
Some Relevant Literature
BAYESIAN POSTERIOR PROBABILITY
Huelsenbeck, J. P., B. Larget, R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51: 673--688.
Huelsenbeck, J. P., B. Rannala, and B. Larget. 2000. A Bayesian framework for the analysis of cospeciation. Evolution 54: 352--364.
Lewis, P. O. 2001. Phylogenetic systematics turns over a new leaf. Trends Ecol. Evol. 16:30--37.
BINOMIAL CONFIDENCE INTERVAL
Zander, R. H. 2001. A conditional probability of reconstruction measure for internal cladogram branches. Syst. Biol. 50:425--437.
Zander, R. H. 2003. Reliable phylogenetic resolution of morphological data can be better than that of molecular data. Taxon 52: 109--112.
BOOTSTRAP CORRECTION FORMULAE
Efron, B., E. Halloran, and S. Holmes. 1996. Bootstrap confidence intervals for phylogenetic trees. Proc. Natl. Acad. Sci. USA 93:7085--7090.
Rodrigo, A. G. 1993. Calibrating the bootstrap test of monophyly. Int. J. Parasitol. 23:507--514.
Salamin, N. T. R. Hodkinson, and V. Savolainen. 2002. Building supertrees: an empirical assessment using the grass family (Poaceae). Syst. Biol. 51:136--150.
Sanderson, M. J., and M. F. Wojciechowski. 2000. Improved bootstrap confidence limits in large-scale phylogenies, with an example from neo-Astragalus (Leguminosae). Syst. Biol. 49:671--685.
Zharkikh, A., and
W.-H. Li. 1995. Estimation of confidence in phylogeny: Complete-and-partial
bootstrap technique. Mol. Phylogen. Evol. 4:44--63.
DECAY INDEX
Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42:795--803.
Bremer, K. 1994. Branch support and tree stability. Cladistics 10:295-304
DeBry, R.W. 2001. Improving interpretation of the decay index for DNA sequence data. Syst. Bio. 50:742--752.
Goloboff, P. A., And J. S. Farris. 2001. Methods for quick consensus estimation. Cladistics 17: 526-534.
Morgan, D. R. 1997. Decay analysis of large sets of phylogenetic data. Taxon 46:509--517.
Oxelman, B, M. Backlund, and B. Bremer. 1999. Relationships of the Buddlejaceae s. l. investigated using parsimony, jackknife and branch support analysis of chloroplast ndhF and rbcL sequence data. Syst. Bot. 24: 164--182.
Rice, K. A., M. J. Donoghue, and R. G. Olmstead. 1997. Analyzing large data sets: rbcL 500 revisited. Syst. Biol. 46: 554--563.
Yee, M. S. Y. 2000. Tree robustness and clade significance. Syst. Biol. 49: 829--836.
EVALUATING CONTRARY SUPPORT
Wilkinson, M., F.-J. Lapointe, and D. J. Gower. 2003. Branch lengths and support. Syst. Biol. 52:127--130.
OVER AND UNDER
CREDIBILITY OF BBP
Wilcox, T. P., D. J. Zwickl, T. Heath, and D. M. Hillis. 2002. Phylogenetic relationship of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Molecular Phylogenetics and Evolution 25:361--371.
Yoshiyuki, S., G. V. Glazko, and M. Nei. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Nat. Acad. Sci. 99: 16138--16143.
PROBLEMS WITH BOOTSTRAP
Hillis, D. M., and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182--192.
Douady, C. J., F. Delsuc, Y. Boucher, W. F. Doolittle, and E. J. Douzery. 2003. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20: 248--254.
Sanderson, M. J.
1989. Confidence limits on
phylogenies: the bootstrap revisited. Cladistics 5:113--129.
Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Syst. Biol. 44:299--320.
SOFTWARE
Hammer, Ø., and D. A. T. Harper. 2003. PAST v. 1.12. http://folk.uio.no/hammer/past
Huelsenbeck, J. P., and F. Ronquist. 2001. MrBayes: v30B4. Bayesian Analysis of Phylogeny. University of California, San Diego, and Dept. of Systematic Zoology, Uppsala University.
Lowry, R. 2000. VassarStats: Web site for statistical computation. Department of Psychology, Vassar College, Poughkeepsie, New York. http://faculty.vassar.edu/~lowry/VassarStats.html, Jan. 25, 2000.
Swofford, D. L. 1998. PAUP*. Phylogenetic Analysis Using Parsimony (* and Other Methods). Ver. 4. Sinauer Associates, Sunderland, Massachusetts.