[Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on large trees
Chris Fields
cjfields at illinois.edu
Fri Aug 8 00:42:16 UTC 2008
On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed
>> with BioPerl (with the only difference that I'm using taxids
>> instead of taxon names). Basically, the script generates a
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.
>> For each taxon, it generates a taxon object, and then merge this
>> object to a tree object that keeps growing. It runs very well with
>> a small number of taxa, but with many taxa (>1000), it is very very
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the function merge_lineage (line 65), which
>> merges the existing tree object with a new taxon object. I guess
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is
>> to find the nodes in common between the tree and the new taxon
>> object...
>> Would you have any idea of how to get around the problem? Should I
>> look under the hood of merge_lineage to try to improve it for large
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case
> I didn't do anything fancy or consider the above problem.
Actually I thought that was fixed; wasn't some caching added for that
script at one point?
chris
More information about the Bioperl-l
mailing list