Be ultrametric, damnit!

I ran into this problem vairous times when downloading a time-calibrated phylogeny off a repository and expecting it to be ultrametric. ape’s is.ultrametric() kicks back a FALSE and for some downstream analyses, functions won’t accept it. This drove me crazy because the trees look pretty damn ultrametric. It took me a while to figure out that either just one or two terminal branches are protruding a tiny, tiny bit, or that the distance from root to tip vaires between tips by the tiniest decimal.

My solution to this problem was to just clip/truncate or elongate the terminal branches so that the root-to-tip distance is the same for all tips.

This works pretty well for me and only needs the ape and adephylo package.

Here is an example:

Make a random ultrametric tree, and then break it (for demonstrative purposes)

library(ape)
library(adephylo)
## Loading required package: ade4
## Registered S3 method overwritten by 'spdep':
##   method   from
##   plot.mst ape
library(phytools)
## Loading required package: maps
## make a random ultrametric tree
set.seed(12345)
tree<-rcoal(50)
par(mfrow=c(1,2))
plot(tree, show.tip.label = F, main="random ultrametric tree")

## lets change some branch lengths by a tiny amount that you can't even see:
tree$edge.length[10:13]<-tree$edge.length[10:13]+0.0095
tree$edge.length[29:31]<-tree$edge.length[29:31]-0.004
plot(tree, show.tip.label = F, main="non-ultrametric tree")

is.ultrametric(tree)
## [1] FALSE

Make it ultrametric

### get the distance from the root to each tip
tip.heights<-distRoot(tree)


### see which is the most common tip height and how much to adjust tips to all be the same
(heights.summary<-table(tip.heights))
## tip.heights
## 1.89899310718981 1.90299310718981 1.92199310718981 1.93149310718981 
##                3               44                2                1
options(digits=22) # set to maximum allowed digits
real.tree.height<-as.numeric(names(which.max(heights.summary)))
over.under<-tip.heights-real.tree.height

## we can now paint the branches that were problematic using phytools()
painted.tree<-paintBranches(tree,which(round(over.under,5)!=0),"2") # here I am rounding to 5 decimal places... pretty arbitrary choice
plotSimmap(painted.tree,lwd=4)
## no colors provided. using the following legend:
##       1       2 
## "black"   "red"

## extract all terminal edges for tips that do not have the final height we want:
tip.ids <- tree$edge[, 2] <= Ntip(tree)
terminal.edges <- tree$edge.length[tip.ids]


## add/subtract the extra length from the terminal branches
corrected.terminal.edges<-terminal.edges-over.under

## change the termnial edges in the phylo object
tree$edge.length[tip.ids]<-corrected.terminal.edges

Final test

plot(tree, show.tip.label = F, main="its ultrametric!!")

is.ultrametric(tree)
## [1] TRUE

Make it a function!

For convenience, we can now just throw it all together into a function and apply it to any tree we like. I have found that using do.call() on distRoot() shaves off a few precious seconds, and I have also included a parallel version using mcapply(), but how much faster this is (particularly on smaller trees) I have not yet tested extensively.

library(ape)
library(adephylo)
library(parallel)
library(phytools)

be.ultrametric<-function(phy, parallel=F, ncores=2, paint.tree=T) {
  
  if(parallel){
    
    run.parallel<-function(phy,tips) do.call(distRoot, args=list(x=phy,tips=tips))
    phy=phy
    tips<-phy$tip.label
    n.tips<-length(tips)
    tip.heights<-mclapply(FUN=run.parallel,
                          X=1:n.tips,
                          phy=phy,
                          mc.cores = ncores)
    tip.heights<-unlist(tip.heights)
  }
  else {
    tip.heights<-do.call(distRoot, args=list(x=phy,tips=phy$tip.label))
  }
  
  heights.summary<-table(tip.heights)
  options(digits=22)
  real.tree.height<-as.numeric(names(which.max(heights.summary)))
  over.under<-tip.heights-real.tree.height
  tip.ids <- phy$edge[, 2] <= Ntip(phy)
  terminal.edges <- phy$edge.length[tip.ids]
  corrected.terminal.edges<-terminal.edges-over.under
  phy$edge.length[tip.ids]<-corrected.terminal.edges
  
  if(paint.tree){
    paint.these<-which(round(over.under,6)!=0)
    if(length(paint.these)>0){
      phy<-paintBranches(phy,paint.these,"2")  
    }
  }
  
  return(phy)
}

Worked example

I ran into this problem when I tried to use amphibian trees from the Pyron 2014 paper.

amph.tree<-read.tree("https://datadryad.org/bitstream/handle/10255/dryad.63062/amph_shl_dates_frogs.tre?sequence=1")

# is it ultrametric?
is.ultrametric(amph.tree)
## [1] FALSE
# strange... it looks pretty damn ultrametric
plot.phylo(amph.tree, show.tip.label = F)

# lets make it ultrametric then and test user times for running this in serial and parallel
### serial
start_time <- Sys.time()
amph.tree.serial<-be.ultrametric(phy=amph.tree)
## Warning in checkTree(object): Labels are not unique.

## Warning in checkTree(object): Labels are not unique.
end_time <- Sys.time()
(serial.elapse.time<-end_time - start_time)
## Time difference of 5.186108481884002863183 mins
### parallel with 8 cores
start_time <- Sys.time()
amph.tree.parallel<-be.ultrametric(phy=amph.tree, parallel=T, ncores=8)
end_time <- Sys.time()
(parallel.elapse.time<-end_time - start_time)
## Time difference of 3.140811336040497003808 mins
### parallel version therefore runs a little faster!

# Boom! ultrametric!
plot.phylo(amph.tree.serial, show.tip.label = F)

is.ultrametric(amph.tree.serial)
## [1] TRUE
is.ultrametric(amph.tree.parallel)
## [1] TRUE

Thoughts…

For large trees this code is still pretty slow even though I have tried to implement parallelization. Most of the heavy-lifting is done by adephylo’s distRoot() function. I wonder if there is a better way to do this similar to the inner workings of ape’s is.ultrametric() function. Drop me a line by starting a github issue related to this repository.