[BioRuby] Update on phyloXML support for BioRuby project

Diana Jaunzeikare rozziite at gmail.com
Tue May 19 21:07:59 UTC 2009


Hi all,

I want to update you on my thoughts about this project and I have some
questions.

So, I think we have reached consensus that the best choice is libxml2-ruby
SAX based XML parser.

Since BioRuby has Tree class ( http://bioruby.org/rdoc/) it seems logical
that the parser should return a Tree class object. By using SAX parser we
avoid the problem of having whole XML file in memory, but still the
phylogenetic trees can be very large, and it might be too much to store
whole thing as a tree object in memory. This could be a little remediated by
having a function next_tree (or next_phylogeny) which would read one tree at
a time if phyloXML file has several of them (this is similar to BioPerl
implementation). I don't think the children nodes can be done in similar
fashion. Since SAX parses sequentially, to get next node (child one level
down) in the tree, whole subtree has to be parsed (in order to wait while
there is event for the end tag of that child), thus loosing on speed. Any
thoughts on this?

Also the Tree class should be extended and added method output_phyloXML
since it has methods output_newick, output_nhx.

I think in order to understand what should be returned after parsing it
would be useful to know how people use phylogenetic tree data. Here are some
I could come up,
* visualize / print
* calculate total branch length of a tree
* query info about specific nodes
* create consensus trees
Any others?

I am a little confused about the require statements in BioRuby classes. It
looks like bio/tree.rb should hold a general class, but it requires
bio/db/newick.rb, but this file in turn requires bio/tree.rb.

Thanks,

Diana

Project Page:
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:PhyloXML_support_in_BioRuby



More information about the BioRuby mailing list