[BioRuby] GSOC: Bioruby PhyloXML update 12

Diana Jaunzeikare rozziite at gmail.com
Mon Aug 10 19:54:55 UTC 2009


Hi all,

What was done last week:

* Coding. Added changes so that now it is completely compatible with
phyloxml schema 1.10

* Testing. added more unit tests (now writer has 9 tests, 26 assertions;
parser: 40 tests, 134 assertions)

* Profiling. I discovered that writer is really slow. The reason is the
implementation of the Tree#children method, which does bfs_shortest_path
algorithm. I had idea of tracking node children inside the node class as an
array, but Naohisa Goto pointed out that then I would also have to deal with
new node, edge addition, removal, etc. So better solution seems to, for now
leave it as it is, and first improve Bio::Tree class. I am planning to do
that after GSOC, since there is only one week left.

* Refactored parser class, got around 3-fold speed increase. Now it can
parse Metazoa taxonomy 33MB file in ~14 seconds (Ubuntu 9.04, ruby 1.8.7
[i486-linux], Intel Core 2 Duo P8600 @2.4GHz)

Next week:

* Create howto wiki page with code examples and usage.
* Do more testing (Anybody has some more phyloxml xml files for me to test,
other than those on phyloxml.org?)
* Any other suggestions from you?

Questions/issues:

* Where should the HOWTO and code example documentation go? Seems reasonable
for it to go here
 http://bioruby.open-bio.org/wiki/HOWTO:Trees and/or
http://bioruby.open-bio.org/wiki/Phyloxml_tree_format (which is linked from
previous link).

* How does integration to the master branch goes? Is all i have to do is
pull_request on github?

* I have implemented PhyloXML::Sequence#to_biosequence, however it returns
incomplete data, since info for Bio::Sequence#classification,
Bio::Sequence#species, Bio::Sequence#division would come from
PhyloXML::Taxonomy class, but it is not accessible from Sequence class.
Should there be PhyloXML::Node#to_biosequence method which would gather
information from both PhyloXML::Sequence and PhyloXML::Taxonomy? or maybe
Bio::Sequence should not hold taxonomic information?

You are all welcome to test my code. It is available on
http://github.com/latvianlinuxgirl/bioruby/tree/dev

Thanks,

Diana



More information about the BioRuby mailing list