[BioRuby] GSOC PhyloXML profiling, bottleneck is Bio::Tree#children

Diana Jaunzeikare rozziite at gmail.com
Fri Aug 7 17:31:45 UTC 2009


Hi all,

Here is update on Google Summer of Code Bioruby PhyloXML project. I was
profiling and refactoring Bioruby PhyloXML Parser code and got 67% speed
increase.

With profiling PhyloXML Writer the story is different. It takes 24minutes to
write the 1.5MB mollusca taxonomy tree and forever other larger files.
Again the bottleneck is bfs_shortest_path, which is called from
Tree#children method. It takes forever to just iterate over all the children
nodes.

To solve this I propose to save an array of the children of the node within
my PhyloXML::Node (which corresponds to a clade) class. This would also
ensure that when a phyloxml file is parsed and then written back, clades
would be the same order in the input and output files.


Have a good weekend,

Diana



More information about the BioRuby mailing list