[Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods

Wed Apr 7 05:19:55 UTC 2010

Hi Joel,

(In reply to comment #0)
On Tue, Apr 6, 2010 at 5:46 PM, <bugzilla-daemon at portal.open-bio.org> wrote:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=3046

> It would be nice if there were get/set properties for phyloXML objects that
> were easier and more concise to use.  Right now, to set, say, a phyloXML
> property, one has to read the code to learn the names and arguments of the
> Property class and also to learn that properties are added by appending to a
> list.

Yes, it's easier to tweak the class definitions if there's not much
syntactic sugar to get in the way. This is still pretty new code ;)
but of course I'm open to suggestions.

> Besides the matter of convenience, there is also a question about how the
> properties and taxonomies objects behave.   I will take the matter up with the
> phyloXML mailing list, but I believe that these objects should be
> dictionary-like rather than list-like.  That is, duplicate ref values should
> not be allowed because the question of how to handle duplicates would have to
> get pushed down to the user level and will be inconsistent.

The Events class (clade.events attribute) mimics a dictionary. Have
you used that yet?

About clade.properties:
If ordering of properties doesn't matter, 'ref' is guaranteed to be
unique at a node, and it seems to be the right way to index the other
associated data, then I can make clade.properties act like a
dictionary. Can we confirm all of these?

And for the implementation, can you provide a sketch of what you'd
like the final structure to look like, and maybe a contrived
doctest-like code example showing what you'd like to be able to do?

In many cases, the phyloXML spec doesn't currently promise enough to
make nice shortcuts work without the possibility of breaking in the
future. For example, check out this new demo with *two* bootstrap
values for every clade:
http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml

I was tempted to make confidences act like a dictionary indexed by
support type, but clearly now that wouldn't have worked. A list of
Confidence objects lets us stay faithful to the raw XML
representation.

> def set_property(self,  *propArgs,  **propkwArgs):
>     for property in self.properties:
>         if property.ref == propArgs[1]:
>             property = PhyloXML.Property(propArgs)
>             return
>     self.properties.append(PhyloXML.Property(*propArgs,  **propkwArgs))
>
> def get_property(self,  key):
>     for property in self.properties:
>         if property.ref == key:
>             return property.value
>     raise KeyError

It's possible that Bio.Phylo will pick up the convention of
"add_foo/get_foo" methods where a property would be overly magical,
and something noteworthy is going on internally. Alignment objects
have "add_sequence", and Phylogeny objects have "get_alignment". Would
you use a Phylogeny method called add_alignment, taking something like
a Phylip character matrix?

We can figure out a sugared interface for clade.properties once we
know how which of the requirements stated above will actually be
guaranteed.

> def set_ID(self,  *idArgs,  **idkwArgs):
>     self.node_id = PhyloXML.Id(*idArgs,  **idkwArgs)

If you do "from Bio.Phylo import PhyloXML as PX" it doesn't really
save much typing, and the **kwargs magic is even less suitable for
introspection.

It's not possible to take advantage of all the PhyloXML annotations
available without learning about the annotation classes themselves.
How about this: I'll write some decent documentation on the Biopython
wiki's PhyloXML page and the official Biopython tutorial/cookbook.

> def add_taxonomy(self,  *taxArgs,  **taxkwArgs):
>     self.taxonomies.append(PhyloXML.Taxonomy(*taxArgs,  **taxkwArgs))
>
> def get_taxonomy(self, rank):
>     for taxonomy in self.taxonomies:
>         if taxonomy.rank == rank:
>             return taxonomy.scientific_name
>     raise KeyError

Unfortunately, none of the Taxonomy attributes are required in the
phyloXML spec, so there's nothing we can rely on for easier indexing.
But, if the phyloXML files you create yourself are well-behaved then
you're free to make your own wrappers over the current low-level
functionality. Clade.taxonomies will always be plural and iterable.

> def set_color(node, red, green,  blue):
>     node.color =  PhyloXML.BranchColor(red, green, blue)

Redundancy makes code harder to maintain -- I'd like to keep it clean
at least for the very first release. The BranchColor class actually
has much cooler functionality than this; try "node.color =
PX.BranchColor.from_name('red')" for example. We can try adding sugar
on top of this, but whatever we add, we'll need to maintain in
Biopython for quite some time.

Thanks again for all the testing and feedback!

Best,
Eric