[BioRuby] GSOC: Bioruby PhyloXML update 12

Diana Jaunzeikare rozziite at gmail.com
Thu Aug 13 21:50:01 UTC 2009


Hi all,

I added here a HOWTO for BioRuby PhyloXML implementation

https://www.nescent.org/wg_phyloinformatics/BioRuby_PhyloXML_HowTo_documentation

Let me know, what you think

Diana

On Mon, Aug 10, 2009 at 3:54 PM, Diana Jaunzeikare <rozziite at gmail.com>wrote:

> Hi all,
>
> What was done last week:
>
> * Coding. Added changes so that now it is completely compatible with
> phyloxml schema 1.10
>
> * Testing. added more unit tests (now writer has 9 tests, 26 assertions;
> parser: 40 tests, 134 assertions)
>
> * Profiling. I discovered that writer is really slow. The reason is the
> implementation of the Tree#children method, which does bfs_shortest_path
> algorithm. I had idea of tracking node children inside the node class as an
> array, but Naohisa Goto pointed out that then I would also have to deal with
> new node, edge addition, removal, etc. So better solution seems to, for now
> leave it as it is, and first improve Bio::Tree class. I am planning to do
> that after GSOC, since there is only one week left.
>
> * Refactored parser class, got around 3-fold speed increase. Now it can
> parse Metazoa taxonomy 33MB file in ~14 seconds (Ubuntu 9.04, ruby 1.8.7
> [i486-linux], Intel Core 2 Duo P8600 @2.4GHz)
>
> Next week:
>
> * Create howto wiki page with code examples and usage.
> * Do more testing (Anybody has some more phyloxml xml files for me to test,
> other than those on phyloxml.org?)
> * Any other suggestions from you?
>
> Questions/issues:
>
> * Where should the HOWTO and code example documentation go? Seems
> reasonable for it to go here
>  http://bioruby.open-bio.org/wiki/HOWTO:Trees and/or
> http://bioruby.open-bio.org/wiki/Phyloxml_tree_format (which is linked
> from previous link).
>
> * How does integration to the master branch goes? Is all i have to do is
> pull_request on github?
>
> * I have implemented PhyloXML::Sequence#to_biosequence, however it returns
> incomplete data, since info for Bio::Sequence#classification,
> Bio::Sequence#species, Bio::Sequence#division would come from
> PhyloXML::Taxonomy class, but it is not accessible from Sequence class.
> Should there be PhyloXML::Node#to_biosequence method which would gather
> information from both PhyloXML::Sequence and PhyloXML::Taxonomy? or maybe
> Bio::Sequence should not hold taxonomic information?
>
> You are all welcome to test my code. It is available on
> http://github.com/latvianlinuxgirl/bioruby/tree/dev
>
> Thanks,
>
> Diana
>



More information about the BioRuby mailing list