[Biopython-dev] PhyloXML helper functions
eric.talevich at gmail.com
Mon Jul 6 23:34:30 UTC 2009
I've been mulling a couple of methods for PhyloXML objects that I thought
could deserve some discussion.
1. Singular properties for some plural attributes
This goes back to the "confidences" issue: When I'm drilling down through a
phyloXML-derived tree, I keep expecting certain attributes to be singular
values when they're actually plural. Auto-completion catches it, of course,
but the resulting code would seem more obvious if I used the singular name
when I know the attribute consists of a list of one element.
The attributes I had in mind for this are taxonomies (Clade class) and
confidences (Clade and Phylogeny classes). Should any other attributes get
this treatment? Here's an example getter method -- Rubyists may ignore the
if len(self.confidences) > 1:
raise RuntimeError, "More than one confidence item is available! Use
elif len(self.confidences) == 0:
raise RuntimeError, "No confidence item is available! You fail"
Then this works as expected, similar to the way certain IO read() functions
work elsewhere in Biopython.
2. A find() method on Clade and maybe Phylogeny objects
The function definition and docstring would look like this:
def find(cls, **kwargs):
"""Find all sub-nodes matching the given attributes.
The first argument specifies the class of the sub-node. (Use
to match any standard phyloXML type.) The arbitrary keyword arguments
the attribute name of the sub-node and the value to match. The result is
iterable through all matching objects.
>>> tree = PhyloXML.read('phyloxml_examples.xml').phylogenies
>>> matches = tree.clade.find(Taxonomy, code='OCTVU')
Taxonomy(code='OCTVU', scientific_name='Octopus vulgaris')
- The keyword argument could be a regular expression. Would that be useful?
To handle numbers, I'd have to convert every sub-node attribute value to a
string, and that would be weird -- or else find() would have to skip
- Non-keyword arguments (*args) could specify just the not-None existence of
an attribute. Allowing regexes would make this unnecessary (e.g. name='.*')
- If no regular arguments are needed, cls could default to PhyloElement or
even "object" to match everything.
- To enable arbitrary hairiness, this function could accept a function as
the value of the keyword argument and return anything truthy. But at that
point, the user could probably just roll their own find_node() function.
However, it could still be useful to filter for numerical values.
What do you think?
More information about the Biopython-dev