[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees

Sendu Bala bix at sendu.me.uk
Fri Aug 8 07:50:50 UTC 2008

Chris Fields wrote:
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed 
>>> with BioPerl (with the only difference that I'm using taxids instead 
>>> of taxon names). Basically, the script generates a taxonomic tree 
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it 
>>> generates a taxon object, and then merge this object to a tree object 
>>> that keeps growing. It runs very well with a small number of taxa, 
>>> but with many taxa (>1000), it is very very very slow (about a week 
>>> for 3000 taxa).
>>> The slowness is due to the  function merge_lineage (line 65), which 
>>> merges the existing tree object with a new taxon object. I guess that 
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find 
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I 
>>> look under the hood of merge_lineage to try to improve it for large 
>>> trees?
>> Yes, please do. It might have been me that wrote that, in which case I 
>> didn't do anything fancy or consider the above problem.
> Actually I thought that was fixed;

Oh yeah. Looks like I did something related to 'speedup for 
merge_lineage()' on the 18th Dec 2006. Tristan, checkout 
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.

More information about the Bioperl-l mailing list