[Biopython-dev] [Wg-phyloinformatics] GSoC Weekly Update 10: PhyloXML for Biopython

Christian Zmasek czmasek at burnham.org
Wed Jul 29 21:12:52 UTC 2009


Hi, Eric:

Looks good!

Remarks:

    - Bioperl's phyloXML driver was written for version 1.00 and might hurl if
      given a v1.10 file -- so that's a potential problem if Biopython defaults
      to writing v1.10 files. Should Writer take a option to specify the file
      format version number? Right now it only writes valid phyloXML v1.00.

This is a nice thought, but to be honest, I would not do it, especially since it is likely there will be more versions in the future (although, hopefully, just extending 1.10, as opposed to the removal and change of elements. 


    - PhyloXMLIO also always writes branch_length as an XML node, not an
      attribute. This validates and will be handled safely by any sane parser,
      and fits better with the idea of an implicit root node in each clade
      object, I think. (The parser still handles an attribute properly.) Any
      objections?

This is fine!


    - Above, I've listed more enhancements than I'll probably be able to finish
      this week. Which should have higher priority? I know merging Bio.Nexus
      and Bio.Tree would be the most useful, but since (1) Biopython
      development still happens on CVS, not Git, and (2) another Tree-based
      GSoC project is expected to land around the same time as mine, I think
      doing the integration right now would be kind of painful. So I can focus
      either on laying the groundwork in Bio.Tree.BaseTree, copying rather than
      moving the relevant Nexus code, or else work mainly on exporting to other
      useful object representations like networkx graphs, or any Biopython
      classes I've missed (e.g. alignments). Suggestions?

Time permitting I would concentrate on  exporting to other useful object representations and on Bio.Tree.BaseTree compatibility with BioSQL's PhyloDB extensions.

Christian





________________________________________
From: wg-phyloinformatics-bounces at nescent.org [wg-phyloinformatics-bounces at nescent.org] On Behalf Of Eric Talevich [eric.talevich at gmail.com]
Sent: Monday, July 27, 2009 10:56 AM
To: Phyloinformatics Group; BioPython-Dev Mailing List
Subject: [Wg-phyloinformatics] GSoC Weekly Update 10: PhyloXML for Biopython

Hi folks,

Previously (July 20-24) I:

    Finished implementing I/O methods, Tree classes and tests for all phyloXML
    elements.

    Changed Writer to preserve node order in the XML; output now validates
    under the phyloXML 1.00 schema (but 1.10 complains)

    Did some drastic code reorganization.
    - Bio.Tree:
        - Moved Clade.find() and PhyloElement.__repr__ methods to BaseTree
          classes
        - Made Clade inherit from BaseTree.Tree in addition to BaseTree.Node,
          and added the corresponding attributes
        - Moved Bio.PhyloXML.Tree to Bio.Tree.PhyloXML

    - Bio.TreeIO:
        - Merged PhyloXML's Parser and Writer into PhyloXMLIO under the new
          Bio.TreeIO module, and updated imports everywhere
        - Added wrappers for Nexus read/write; doesn't return Bio.Tree objects
          yet though

    Added/updated unit tests for all of this.

    Documented the code reorg on the Biopython wiki, adding Tree and TreeIO
    pages and fixing the examples on the PhyloXML page.

    Scrubbed docstrings and enabled epydoc processing.


This week (July 27-31) I will:

    Finish implementing the phyloXML spec:

    - Scan "simple types" for restricted tokens; check strings in constructors
    - Take a stab at phyloXML 1.10 support (need a 'version' arg to Writer?)
    - Clean up and reorganize any code that needs it

    Enhancements (time permitting):

    - Improve the SeqRecord conversion
    - Work on Bio.Tree.BaseTree compatibility with BioSQL's PhyloDB extension
    - Port common methods to Bio.Tree.BaseTree -- see Bio.Nexus.Tree, Bioperl
      node objects, PyCogent, p4-phylogenetics
    - Tree method: build_index (set left_idx, right_idx on all nodes):
        - calculate left/right indexes for nested-set representation
        - see http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html

    - Export to networkx (http://networkx.lanl.gov/) -- also get graphviz export
      for free, via networkx.to_agraph()


Remarks:

    - Bioperl's phyloXML driver was written for version 1.00 and might hurl if
      given a v1.10 file -- so that's a potential problem if Biopython defaults
      to writing v1.10 files. Should Writer take a option to specify the file
      format version number? Right now it only writes valid phyloXML v1.00.

    - PhyloXMLIO also always writes branch_length as an XML node, not an
      attribute. This validates and will be handled safely by any sane parser,
      and fits better with the idea of an implicit root node in each clade
      object, I think. (The parser still handles an attribute properly.) Any
      objections?

    - Above, I've listed more enhancements than I'll probably be able to finish
      this week. Which should have higher priority? I know merging Bio.Nexus
      and Bio.Tree would be the most useful, but since (1) Biopython
      development still happens on CVS, not Git, and (2) another Tree-based
      GSoC project is expected to land around the same time as mine, I think
      doing the integration right now would be kind of painful. So I can focus
      either on laying the groundwork in Bio.Tree.BaseTree, copying rather than
      moving the relevant Nexus code, or else work mainly on exporting to other
      useful object representations like networkx graphs, or any Biopython
      classes I've missed (e.g. alignments). Suggestions?


Cheers,
Eric
http://github.com/etal/biopython/tree/phyloxml/Bio/PhyloXML
https://www.nescent.org/wg_phyloinformatics/PhyloSoC:Biopython_support_for_parsing_and_writing_phyloXML





More information about the Biopython-dev mailing list