[Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo

Eric Talevich eric.talevich at gmail.com
Tue Dec 25 07:18:40 UTC 2012


On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:

> Hi all,
>
> I've implemented support for two new phylogenetic tree formats: NeXML and
> RDF (conforming to the Comparative Data Analysis Ontology).
>
> I noticed that NeXML support was planned, but I didn't see anyone working
> on it on GitHub and the feature request hadn't been updated in about a
> year, so I went ahead and implemented a simple version. At first I tried
> the generateDS.py approach, but the generated writer doesn't give very much
> control over the output, so I ended up writing my own parser/writer using
> ElementTree.
>
> As for the RDF/CDAO format, AFAIK this is not a format that's supported by
> any other phylogenetic libraries, so I'm not sure how useful this is to
> everyone else. It provides a simple, standards-compliant format that can be
> imported to a triple store and supports annotation. We'll be using it at
> NESCent so I wanted to make it available to everyone else as well. The
> parser and writer require the Redlands Python bindings.
>
> The code is available in my fork of Biopython,
>
>     https://github.com/bendmorris/biopython
>
> under branches "cdao" and "nexml." I'd love to get everyone's thoughts and
> see if these contributions would be a good fit for the Biopython project.
>


Thanks for letting us know! I'll try it out soonish. Looking at the code on
your nexml branch, I have a few comments:

- The parser uses ElementTree.parse rather than iterparse, so in its
current state it would not be able to parse massive files (those larger
than available RAM). Worth fixing eventually?

- The parser creates Newick.Tree and Newick.Clade objects, which is nearly
correct in my opinion. I would suggest subclassing BaseTree.Tree and
BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you
don't have any additional attributes to attach to those classes at the
moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and
PhyloXMLIO.py.)

- The 'confidence' or 'confidences' attribute isn't used (for e.g.
bootstrap support values). Does NeXML define it?

Best,
Eric



More information about the Biopython-dev mailing list