[Biopython-dev] [Biopython - Feature #3217] (In Progress) Bio.Phylo I/O support for the NeXML format

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Fri Jul 1 23:29:27 UTC 2011


Issue #3217 has been updated by Eric Talevich.

Status changed from New to In Progress
Assignee changed from Eric Talevich to Biopython Dev Mailing List
% Done changed from 0 to 20

Jaime Huerta-Cepas pointed me to a strategy he's using to support both phyloXML and NeXML in ETE. A separate program called generateDS.py generates parsers automatically from the XSD files defining the specs.

Here's the code:
https://github.com/jhcepas/phylogenetic-XML-python-parsers

I suggest:

1. Copying nexml.py into the Biopython source tree as Bio/Phylo/_nexml_gds.py
2. Writing something basic to convert the essential tree elements into compatible Bio.Phylo object types. Call that NexmlIO, for now? Also write unit tests.
3. As time permits, write more converters to make _nexml_gds.py objects compatible with existing Biopython types. This could include character matrices for AlignIO, and more tree annotations for Phylo.

When generateDS.py is updated we'll just copy the newly generated nexml.py into _nexml_gds.py manually -- hopefully this won't require many changes in the converters each time.

Timeline: After the 1.58 release.
----------------------------------------
Feature #3217: Bio.Phylo I/O support for the NeXML format
https://redmine.open-bio.org/issues/3217

Author: Eric Talevich
Status: In Progress
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: Not Applicable
URL: 


The future data exchange standard is... approaching rapidly. NeXML is going to become the format of choice for TreeBASE, Mesquite and probably MIAPA-targeted tools over the next year or two, and Biopython should be there to support it.

Notes:
* Another Python library, DendroPy, already supports (some of?) the NeXML format. Jeet Sukumaran and Mark Holder changed the license to BSD to allow other projects -- particularly us -- to share their code. So let's start there.
* NeXML was designed so its elements can be treated as RDF triples, so see if RDFLib can help -- either as the underlying parser, or to provide some additional (optional) functionality.

See:
http://nexml.org/
http://packages.python.org/DendroPy/
http://www.rdflib.net/


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list