[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees

Chris Fields cjfields at illinois.edu
Fri Aug 8 00:42:16 UTC 2008


On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed  
>> with BioPerl (with the only difference that I'm using taxids  
>> instead of taxon names). Basically, the script generates a  
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.  
>> For each taxon, it generates a taxon object, and then merge this  
>> object to a tree object that keeps growing. It runs very well with  
>> a small number of taxa, but with many taxa (>1000), it is very very  
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the  function merge_lineage (line 65), which  
>> merges the existing tree object with a new taxon object. I guess  
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is  
>> to find the nodes in common between the tree and the new taxon  
>> object...
>> Would you have any idea of how to get around the problem? Should I  
>> look under the hood of merge_lineage to try to improve it for large  
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case  
> I didn't do anything fancy or consider the above problem.

Actually I thought that was fixed; wasn't some caching added for that  
script at one point?

chris



More information about the Bioperl-l mailing list