[Biopython-dev] Code review request for phyloxml branch

Eric Talevich eric.talevich at gmail.com
Tue Jan 5 00:09:18 UTC 2010


Hi Brad, I hope the holidays treated you well.

On Mon, Jan 4, 2010 at 5:16 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Are the annotations often used in real life cases or is this more of
> a fringe problem? I'm not as familiar with tree work, but know this
> is a pain in sequence space. A good goal is to capture the most
> common use cases and then integrate the other issues as feasible.
>

The data that TreeIO preserves round-trip are:

 - Branching structure (topology)
 - Branch lengths
 - Clade/taxon names
 - Rooted-ness (for the whole tree)
 - Tree ID

The troublesome parts are:

 - The "confidences" attribute in PhyloXML trees should map onto the
"support" attribute in Nexus trees, but that's tricky -- the original Nexus
attribute seemed content with a little ambiguity in what that attribute's
numerical value actually meant (relative/absolute support), while PhyloXML
uses a list of Confidence objects containing both a numerical value and a
"type" string such as "bootstrap". Currently that information is dropped
when converting between PhyloXML and Nexus/Newick trees.
 - Nexus also has a "comment" attribute for each node, while PhyloXML
doesn't directly support that.
 - The branch length of the root node/clade is None in PhyloXML, but 0.0 in
Nexus. I prefer None because there is no meaningful branch leading to that
node, but there might be a reason 0.0 was chosen for Nexus that I'm not
aware of.
  - The names of unlabeled internal nodes might change from None to "" in
some cases, since None is the PhyloXML default and "" is the Nexus default.
 - Since PhyloXML supports more structured taxonomic information on each
node than Newick, it's possible to have a PhyloXML tree where a Clade has no
name, but instead one or more Taxonomy objects containing the scientific
name, common names, etc. -- so when this tree is converted to Newick format
the taxonomy info is lost for those nodes. I could squash the Taxonomy
object into a string for the sake of Nexus labels, but I think it would be
safer (less surprising) to just write a cookbook entry on how to collapse
PhyloXML Taxonomies into Clade names to aid format conversions.

If the support-vs-confidence issue can be resolved, then we can treat
PhyloXML as a rough superset of Newick, in terms of annotation, and then it
shouldn't be surprising to lose some annotation data in converting PhyloXML
to Newick.

Cheers,
Eric



More information about the Biopython-dev mailing list