[BioRuby] BioRuby Phyloxml update

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Tue Nov 17 16:27:46 UTC 2009


Hi,

I've just committed speed-up of Bio::Tree#children in my repository.
It keeps compatibility. Trade-off for the speed-up, memory consumption
is a little bit larger than the previous code.
http://github.com/ngoto/bioruby

For the benchmark of reading and writing big PhyloXML code, based
on Diana's test_phyloxml_big.rb, a new sample code is added
as sample/test_phyloxml_big.rb.

Running the new sample/test_phyloxml_big.rb on a machine
(Pentium D 3.40GHz, memory 4GB, running Debian GNU/Linux)
with http://github.com/ngoto/bioruby:
47.52user 0.93system 0:50.09elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+141424outputs (0major+167550minor)pagefaults 0swaps

with http://github.com/latvianlinuxgirl/bioruby/tree/tree_class
43.55user 1.00system 0:46.59elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+141424outputs (0major+165151minor)pagefaults 0swaps

Although my new code is still ~10% slower than Diana's new code,
I think it can be acceptable because my code keeps compatibility.

I wrote Bio::Tree because I want to manipulate trees flexibly,
e.g. merging and splitting trees, changing root of trees.
For the purpose, I didn't take the way to have parent/children
in a node.

I also think the current Bio::Tree is not the best. One of the
weak points is it is relatively heavy. The flexibility may
not be needed for parsers only representing fixed data structure.
New class seems attractive for usages that can not be coverd with
the current Bio::Tree implementation.

Thanks,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org

On Tue, 17 Nov 2009 09:52:59 -0500
Diana Jaunzeikare <djaunzei at smith.edu> wrote:

> Thanks for discussion. I see Naohisa's point that it is difficult to
> keep consistency when copying a tree.
> 
> Right now PhyloXML class inherits from Bio::Tree class. Instead, I
> could write a new general Bio::FamilyTree class (per Pjotr's
> suggestion), which would be strictly a tree (I believe that Bio::Tree
> allows for a node to have 2 parents) and would have parent/child
> information. Thus it would not need underlying general graph
> implementation, therefore making the implementation simpler than that
> of Bio::Tree. Then PhyloXML::Tree would inherit from Bio::FamilyTree.
> This way PhyloXML writer probably would be even faster because it
> would not need to update Bio::Pathway structure (which is under
> Bio::Tree) every time adding a node or edge.
> Additionally, I think BioRuby would benefit from general
> Bio::FamilyTree class. I recently heard a talk by researcher who did
> phylogenetic analysis of musical rhythms.
> 
> Also I will write method to convert from newick to PhyloXML.
> 
> What do you think?
> 
> Cheers,
> Diana



More information about the BioRuby mailing list