[Biopython-dev] BioGeography update/BioPython tree module discussion

Eric Talevich eric.talevich at gmail.com
Tue Jul 21 16:03:44 UTC 2009


Hi Nick,

On Mon, Jul 20, 2009 at 3:13 PM, Nick Matzke <matzke at berkeley.edu> wrote:

> 4. Philosophy question: If I build some functions that do something new
> with an e.g. ElementTree (XML tree) object, should I:
>
> (a) make these functions go in a subclass of the class for the original
> object (thus inheriting the methods of the original class, and basically
> adding new methods).  E.g. basically extending the methods of ElementTree,
> with a subclass GbifElementTree; or:
>
> (b) make a class containing the object as an attribute, with e.g.
> GbifXml.xmltree containing an ElementTree attribute which then gets passed
> to the various functions.
>
> I currently have (b) but the more I think about it, the more (a) makes more
> sense from a simplicity/usability/maintainability sense.
>
>
I have some ElementTree-related helper functions, too. Since we're still
maintaining compatibility with Python 2.4 and xml.etree didn't enter the
standard library until Py2.5, the ElementTree interface could potentially
come from several different sources, with slightly different capabilities.
It's a weird module in general... basically, I'm treating the library like a
wild badger -- a function either relies on the ETree object structure, or it
doesn't, and the ETree-specific functions live in their own area near the
top of the file. The methods that do phyloXML-specific work call another
function to extract what they need from a node, then carry on with ordinary,
well-behaved Python objects.

When Bio.Tree integration comes due, we could check how much our various
ETree utilities overlap and maybe combine them into a separate module. For
instance, I have a tree pretty-printer and a function for dumping a list of
XML node tags, too.

Summary: Integrating with Bio.Tree will involve some refactoring, and it
would be easier if the ElementTree stuff was quarantined off a little bit.



>  def extract_latlongs(self, element):
>
>    Create a temporary pseudofile, extract lat longs to it,
>    return results as string.
>
>    Inspired by: http://www.skymind.com/~ocrow/python_string/<http://www.skymind.com/%7Eocrow/python_string/>
>    (Method 5: Write to a pseudo file)
>

Neat article! I was intrigued by this result so I tried to replicate it --
and my results were different, since newer Pythons have some string
optimizations that weren't in place when the article was written. Adding
strings together in a loop doesn't lead to quadratic time complexity
anymore.

Blogged it:
http://etalog.blogspot.com/2009/07/faster-string-concatenation-in-python.html



> def xmlstring_to_xmltree(xmlstring):
>
>  Take the text string returned by GBIF and parse to an XML tree using
> ElementTree.
>  Requires the intermediate step of saving to a temporary file (required to
> make
>  ElementTree.parse work, apparently)
>
>
Did cStringIO work as a temp file handle? I wonder if this is a bug in
Python.

Overall, it's great to see Biopython is going to have such solid
phylogenetics/geography support. Should be fun to work with in the future.

Cheers,
Eric



More information about the Biopython-dev mailing list