I ran into this problem vairous times when downloading a time-calibrated phylogeny off a repository and expecting it to be ultrametric. ape’s is.ultrametric()
kicks back a FALSE
and for some downstream analyses, functions won’t accept it. This drove me crazy because the trees look pretty damn ultrametric. It took me a while to figure out that either just one or two terminal branches are protruding a tiny, tiny bit, or that the distance from root to tip vaires between tips by the tiniest decimal.
My solution to this problem was to just clip/truncate or elongate the terminal branches so that the root-to-tip distance is the same for all tips.
This works pretty well for me and only needs the ape and adephylo package.
Here is an example:
library(ape)
library(adephylo)
## Loading required package: ade4
## Registered S3 method overwritten by 'spdep':
## method from
## plot.mst ape
library(phytools)
## Loading required package: maps
## make a random ultrametric tree
set.seed(12345)
tree<-rcoal(50)
par(mfrow=c(1,2))
plot(tree, show.tip.label = F, main="random ultrametric tree")
## lets change some branch lengths by a tiny amount that you can't even see:
tree$edge.length[10:13]<-tree$edge.length[10:13]+0.0095
tree$edge.length[29:31]<-tree$edge.length[29:31]-0.004
plot(tree, show.tip.label = F, main="non-ultrametric tree")
is.ultrametric(tree)
## [1] FALSE
### get the distance from the root to each tip
tip.heights<-distRoot(tree)
### see which is the most common tip height and how much to adjust tips to all be the same
(heights.summary<-table(tip.heights))
## tip.heights
## 1.89899310718981 1.90299310718981 1.92199310718981 1.93149310718981
## 3 44 2 1
options(digits=22) # set to maximum allowed digits
real.tree.height<-as.numeric(names(which.max(heights.summary)))
over.under<-tip.heights-real.tree.height
## we can now paint the branches that were problematic using phytools()
painted.tree<-paintBranches(tree,which(round(over.under,5)!=0),"2") # here I am rounding to 5 decimal places... pretty arbitrary choice
plotSimmap(painted.tree,lwd=4)
## no colors provided. using the following legend:
## 1 2
## "black" "red"
## extract all terminal edges for tips that do not have the final height we want:
tip.ids <- tree$edge[, 2] <= Ntip(tree)
terminal.edges <- tree$edge.length[tip.ids]
## add/subtract the extra length from the terminal branches
corrected.terminal.edges<-terminal.edges-over.under
## change the termnial edges in the phylo object
tree$edge.length[tip.ids]<-corrected.terminal.edges
plot(tree, show.tip.label = F, main="its ultrametric!!")
is.ultrametric(tree)
## [1] TRUE
For convenience, we can now just throw it all together into a function and apply it to any tree we like. I have found that using do.call()
on distRoot()
shaves off a few precious seconds, and I have also included a parallel version using mcapply()
, but how much faster this is (particularly on smaller trees) I have not yet tested extensively.
library(ape)
library(adephylo)
library(parallel)
library(phytools)
be.ultrametric<-function(phy, parallel=F, ncores=2, paint.tree=T) {
if(parallel){
run.parallel<-function(phy,tips) do.call(distRoot, args=list(x=phy,tips=tips))
phy=phy
tips<-phy$tip.label
n.tips<-length(tips)
tip.heights<-mclapply(FUN=run.parallel,
X=1:n.tips,
phy=phy,
mc.cores = ncores)
tip.heights<-unlist(tip.heights)
}
else {
tip.heights<-do.call(distRoot, args=list(x=phy,tips=phy$tip.label))
}
heights.summary<-table(tip.heights)
options(digits=22)
real.tree.height<-as.numeric(names(which.max(heights.summary)))
over.under<-tip.heights-real.tree.height
tip.ids <- phy$edge[, 2] <= Ntip(phy)
terminal.edges <- phy$edge.length[tip.ids]
corrected.terminal.edges<-terminal.edges-over.under
phy$edge.length[tip.ids]<-corrected.terminal.edges
if(paint.tree){
paint.these<-which(round(over.under,6)!=0)
if(length(paint.these)>0){
phy<-paintBranches(phy,paint.these,"2")
}
}
return(phy)
}
I ran into this problem when I tried to use amphibian trees from the Pyron 2014 paper.
amph.tree<-read.tree("https://datadryad.org/bitstream/handle/10255/dryad.63062/amph_shl_dates_frogs.tre?sequence=1")
# is it ultrametric?
is.ultrametric(amph.tree)
## [1] FALSE
# strange... it looks pretty damn ultrametric
plot.phylo(amph.tree, show.tip.label = F)
# lets make it ultrametric then and test user times for running this in serial and parallel
### serial
start_time <- Sys.time()
amph.tree.serial<-be.ultrametric(phy=amph.tree)
## Warning in checkTree(object): Labels are not unique.
## Warning in checkTree(object): Labels are not unique.
end_time <- Sys.time()
(serial.elapse.time<-end_time - start_time)
## Time difference of 5.186108481884002863183 mins
### parallel with 8 cores
start_time <- Sys.time()
amph.tree.parallel<-be.ultrametric(phy=amph.tree, parallel=T, ncores=8)
end_time <- Sys.time()
(parallel.elapse.time<-end_time - start_time)
## Time difference of 3.140811336040497003808 mins
### parallel version therefore runs a little faster!
# Boom! ultrametric!
plot.phylo(amph.tree.serial, show.tip.label = F)
is.ultrametric(amph.tree.serial)
## [1] TRUE
is.ultrametric(amph.tree.parallel)
## [1] TRUE
For large trees this code is still pretty slow even though I have tried to implement parallelization. Most of the heavy-lifting is done by adephylo’s distRoot()
function. I wonder if there is a better way to do this similar to the inner workings of ape’s is.ultrametric()
function. Drop me a line by starting a github issue related to this repository.