[BioRuby] BioRuby: newick parser

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Mon Mar 12 06:18:48 UTC 2012


Hi Pjotr,

They can be divided into several parts.

1. Newick/NHX parser and writer: 

1-1. Implementation: I think it is enough quiality. The implementation
complexity is due to the Newick specification (e.g. escaping of special
characters) and some undocumented conventions (e.g. bootstrap values).
For refactoring, using Racc (parser generator for Ruby) seems good,
but low priority.

1-2. Parser API: Parsing a string is simple. Reading from files is
depended on Bio::FlatFile system, which is enough for most cases.

1-3. Writer API: depending on Bio::Tree API.

2. Nexus parser and writer:

2-1: Implementation: I don't know details of current status,
but for trees, it only passes the data to Bio::Newick class.
Please ask Christian for details.

2-2: API: Nexus Parser API is complicated because the Nexus
specification is very complex.
It seems that Nexus writer is missing.

3: PhyloXML parser and writer:

3-1: Parser implementation and API: Enough quality. Its
complexity is mainly due to the on-demand partial reading of
XML files, which saves memory for a large tree file.

3-2: Writer implementation and API: Not enough. It can only
write PhyloXML data, and it is very hard to output Bio::Tree
as PhyloXML format.

3-3: Other topics: It uses libxml-ruby, but it seems that
Ruby XML parser de-facto standard is now Nokogiri, and
I think it may be rewritten by using Nokogiri in some days.

4. Bio::Tree data structure:

4-1. Implementation: It is based on BioRuby internal graph
library. It can be changed to use other graph library.

4-2. API: The API design is based on tree API of other
open-bio projects and generic graph library API.

When writing HOWTO based on BioPerl HOWTO:Trees,
(http://bioruby.open-bio.org/wiki/HOWTO:Trees but still incomplete)
I'm thinking to add/modify some API about speficying nodes/edges.


> I see Christian has done a lot of work in this area (mostly in Java),
> even to the point of taking standards forward. Maybe I should ask him?

I'd like to hear his advice, too.

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org


On Sun, 11 Mar 2012 13:53:08 +0100
Pjotr Prins <pjotr.public14 at thebird.nl> wrote:

> Hi Naohisa and others,
> 
> I am looking at the Newick/Nexux/PhyloXML parsers at the moment. The
> BioRuby ones look rather complete, if not a tad overcomplicated. 
> 
> Are you happy with the state of affairs, or do you thing it could be
> improved/simplified?  Also, for walking the tree, is the interface now
> the one you would choose to implement?
> 
> I am asking, because I am looking for the most intuitive way of
> parsing and traversing tree information. I see Christian has done a
> lot of work in this area (mostly in Java), even to the point of
> taking standards forward. Maybe I should ask him? It appears to me we
> have solid parsers and data structures. Walking the trees, however,
> is less straightforward, and documentation somewhat lacking.
> 
> Anyone happy to correct me?
> 
> Pj.




More information about the BioRuby mailing list