The Phylomatic Project

run it!
phylocom
webb
donoghue

AND

Background

As part of ongoing research into the phylogenetic structure of plant communities, we (Cam Webb and Michael Donoghue) began assembling published phylogenies into a 'mega-tree' of all the taxa in communities that we are interested in. Now all angiosperm families are included. This tree of trees is not a true supertree (e.g., Sanderson, Purvis and Henze 1998), in that it is being assembled 'by hand,' rather than by an automated supertree algorithm, and conflicting branching patterns are being resolved 'subjectively.' It is, however, intended to represent a good current approximation of the true tree of higher plants (minus reticulations!). This is obviously something of a rough-and-ready process, conducted for pragmatic expediency. To be absolutely clear:

*** This a constantly-changing, working hypothesis. Use at your own risk! ***

This work complements rather than competes with more comprehensive collections of trees, such as the Tree of Life project, and TreeBASE.

As a service to the larger biological community, we are making our mega-trees available as we put them together. We supply online software (Phylomatic) for users to rapidly assemble their own community trees from our mega-tree. The main limitation for users is that the taxa you are interested in must appear in the mega-tree (at genus or family level). You can help us in this growing project by requesting that taxa be added (and suggesting relevant phylogenies). Comments on the website and on the process are valued.


Mega-tree construction rules

Since Peter Stevens has put his (dynamic) hypothesis for the phylogenetic relationship among the APG orders (on APweb), we have been using this as the backbone of our mega-tree, and are extremely grateful for his effort. We had previously been compiling our tree from the most recent 'whole angiosperm' study, e.g. Soltis et al. (2000). Peter has extensively documented his sources for tree construction down to the family level.

Our 'resolved' trees (name prefixed by 'R') use the complete resolution that Peter has decided on on APweb. Our 'conservative' trees (name prefixed by 'C') remove any branches that Peter has annotated as having bootstrap support less that 80%, or when a whole tree is indicated as having 'weak support.'

To the APweb tree, we are adding family-level phylogenies from published studies. Our method for deciding which tree to include is:

  1. Where a single most parsimonious tree is presented, this topology is used for our 'resolved' tree. The 'conservative' tree is based on the strict consensus tree given, with branches of support less than 80% removed.
  2. Where parsimony and maximum likelihood analyses are both presented, the former result is used, not as a statement about the relative merit of the two approaches, but for consistency.
  3. The results from a more focused study (on particular nodes at any taxonomic level) over-rides conflicting results of a broader study (for the ingroup only).
  4. Studies using a greater number of informative characters are given dominance over studies with fewer characters.
  5. Where no molecular or morphological phylogenetic analyses have been published, sub-familial classification and authors' published 'intuitions' about relationships are used in the 'resolved' tree.


The Current megatrees

Because the tree has grown so large, I am maintaining it as a series of plain text newick files. The current compiled trees are:


The Davies et al tree

Thanks again to Jonathan for sharing this. See Davies et al. (2004) for info on branch support. A number of terminals in the paper were composites of several families, and I substituted these families in as monophyletic polytomies in the consensus tree, but collapsed them back to the last node in the dated tree. Note that the branch lengths from the terminals (family names) represent MAXIMUM ages for those clades - use with care!

Megatree archives

We intend to make all significant revisions of the tree available. Thus users can cite the tree version number in publications:

  • Tree L20011010 (or R20011010). Available as a Newick file, and a Nexus file. The numbers refer to notes and references used in the construction of this tree. Family names are attached as terminals, with a '-' before the name, to the interior node that corresponds to the family.
  • Tree L20020928 (or R20020928). Nexus.
  • Tree L20030804 (or R20030804). Newick.
  • Tree L20031115 (or R20031115). Newick.
  • Tree R20031202 Newick, and Tree C20031202 Newick.


Phylomatic

The online software takes your list of taxa, and first tries to match them by genus name to the megatree. Failing that, they are attached by family name. If all the genera appear in the megatree, then that family appears resolved. If even one genus is missing from the megatree, the returned phylogeny portrays a polytomy of genera. Currently, species are not included in the megatree, and species within a genus are always returned as polytomies.


'To do' list

  • Branch lengths Using fossil dates for important nodes, and rate smoothing to make the tree ultrametric, we hope to soon offer output trees with meaningful branch lengths. See Bladj
  • Add an option to add 'unfound' genera to the base of resolved genera, rather than polymomizing all genera.
  • Add another option to drop out unresolved genera from the input list


Bibliography

APweb contains an enormous and up-to-date bibliography. Additional articles and sources used in the construction of the phylomatic R and C trees can be found here.

Last revised: $Date: 2005/01/06 01:18:11 $