[BioRuby] BioRuby Phyloxml update

Mon Nov 16 10:11:24 UTC 2009

All,

I think we should make a good effort of merging Diana's code into the
bioruby codebase. Even though I'm not completely familiar with
bioruby's phylo implementation, an effort like hers should be welcomed
with open arms.

If her code speeds things up so immensely, why don't we start a new
branch that will lead to bioruby 2.0? Let bioruby 2.0 break things.
With a major new release things are allowed to be broken free from the
legacy code.

We definitely don't want Diana's efforts be in vain.

jan.

2009/11/8 Naohisa Goto <ngoto at gen-info.osaka-u.ac.jp>:
> Hi Diana,
>
> I'm sorry that the changes cannot be accepted, because the
> modification of existing Bio::Tree methods breaks things.
> Bio::Tree does not want to have children/parent information
> in nodes. One of the reasons is that it is difficult to keep
> consistency when copying a tree. Nodes can be shared with two
> or more trees when copying a tree by using "dup" or "clone"
> method.
>
> Normally, tests for existing classes shold not be modified
> except when changing specification or the test's bug, because
> they guarantee specification of the class. Adding new tests
> are OK.
>
> If you really want nodes to have parent/children information
> in each node, please do so in only PhyloXML classes (though
> I'm negative).  In this case, the problem is that reading phyloxml
> data and write back again seems good, but it seems there are
> currently no way to convert Bio::Tree to PhyloXML. Now, it seems
> hard to convert Newick data to PhyloXML.
>
> Now, to prepare to include your PhyloXML code in BioRuby, I'm working
> on my branch. Some API changes will be made.
> http://github.com/ngoto/bioruby/tree/incoming
>
> Note that in your test code, argument order of assert_equal is wrong.
> I've already fixed in my branch.
> http://github.com/ngoto/bioruby/commit/a291af62ef262ee04f3a0e1b6415d4e256c56a94
>
>> * Updated  Tree::add_edge, Tree::clear_edge, Tree::remove_edge,
>> Tree::remove_edge_if, Tree:remove_nonsense_nodes in order to keep
>> track of Node::parent and Node::children nodes correctly.  Have I
>> forgotten anything?
>
> Changing root with tree.root=().
>
> --
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>
>
>> Hi all,
>>
>> So finally I have updated Bio::Tree and Bio::Node classes to improve
>> the phyloxml writer speed.
>>
>> * Added Bio::Node::parent and  Bio::Node::children (array of nodes) in
>> order to avoid calling Tree::parent(node) or Tree::children(node),
>> because those methods call breath first search on the underlying
>> graph, which makes PhyloXML writer and parser incredibly slow. In
>> contrast, Bio::Node::parent and Bio::Node::children keeps references
>> to the respective nodes.
>> * Updated  Tree::add_edge, Tree::clear_edge, Tree::remove_edge,
>> Tree::remove_edge_if, Tree:remove_nonsense_nodes in order to keep
>> track of Node::parent and Node::children nodes correctly.  Have I
>> forgotten anything?
>> * Now for PhyloXML writer it takes less than 1 second instead of
>> ~20minutes to write ncbi_taxonomy_mollusca.xml file 1.5MB
>> * To write the tree of life taxonomy file (~46MB) it takes 10 seconds
>> (On 2.4GHz, 2.9GB RAM, running Ubuntu)
>>
>> The code is in http://github.com/latvianlinuxgirl/bioruby/tree/tree_class
>>
>> I wrote unit tests for my changes and made sure my changes don't break
>> anything else. However, does anybody has code laying around that uses
>> Tree::parent and Tree::children methods so that I can test it more
>> thoroughly?
>>
>> Cheers,
>> Diana
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>