[Biopython-dev] [Wg-phyloinformatics] BioGeography update

Nick Matzke matzke at berkeley.edu
Tue Jul 7 18:12:10 UTC 2009


Hi all,

I am just now back in town and would love to co-coordinate on this.  I 
agree having multiple newick parsers etc. is undesirable, I just found I 
was forced to that this spring when BioPython didn't have what I need 
even for pretty standard Newick files.  I have also made use of 
Mailund's newick parser in the past.

I am booked this afternoon but will go through the thread more this 
evening and comment further. Cheers!
Nick

Eric Talevich wrote:
> On Tue, Jul 7, 2009 at 9:02 AM, Brad Chapman <chapmanb at 50mail.com 
> <mailto:chapmanb at 50mail.com>> wrote:
> 
>     Hi Stephen;
> 
>     We can require lagrange to be installed and use imports to
>     grab the needed code. The other option is that y'all can explicitly
>     relicense a subset of the code under the Biopython license.
> 
> 
> Trivia: it looks like lagrange in turn depends on scipy, but quickly 
> glancing through the code, I only see numpy functions being used. Since 
> some other Biopython modules already depend on numpy, could the 
> installation of lagrange for Bio.Geography be made simpler by just 
> changing the import to numpy?
> 
>      > I can see however
>      > where the Bio.Nexus functionality might not be sufficient for tree
>      > manipulation. I am not a contributor to the BioPython dev group so I
>      > cannot speak to those specifics, but as a user I can see separating
>      > out the tree functions from the Nexus package (and tree I/O in
>      > general) as logically a phylogenetic tree structure has little to do
>      > with the nexus file format. It can be somewhat awkward to deal
>     with in
>      > the current form. A more general implementation might be a Bio.Tree
>      > package with I/O readers in Nexus and Newick and XML, etc.
> 
>     Definitely. Eric has been discussing this with regards to the
>     PhyloXML project and we had been looking at other Tree
>     representations: in PyCogent and Thomas Mailund's Newick module.
>     Considering the lagrange tree model makes a lot of sense as well.
>     What I'd like to see is a stab at a generalized Tree object that
>     supports the operations you need and that the Bio.Nexus parser can
>     produce, exactly as you describe. Eric and Nick, what do you think
>     about coordinating on this?
> 
> 
> Sounds great to me. My impression is that most tree representations are 
> based on a recursive Node element with a few associated attributes and a 
> number of useful methods; phyloXML has a Clade object roughly 
> corresponding to that, but also a bunch of other element types for 
> extensive annotation of the tree. So two options spring to mind:
> 
> 1. Let the Bio.PhyloXML.Tree objects be a superset of everything needed 
> by any phylogenetic tree representation, ever. (It's already pretty 
> close.) Refactor Nexus and Newick to use these objects; merge the 
> features of lagrange so the rest of the Biopython environment can 
> benefit. Only export to external object structures that are something 
> other than a straight phylogenetic tree -- e.g. networkx or graphviz for 
> plotting, numpy/scipy for crunching.
> 
> 2. Factor a simple tree structure out of lagrange and Bio.Nexus, and let 
> that be the Biopython default representation. Add a function in 
> Bio.PhyloXML to export its enhanced tree structure to this simpler 
> Bio.Tree representation.
> 
> I wrote Bio.PhyloXML.Tree to use the naming conventions of phyloXML, but 
> otherwise be independent of that specific file format. It doesn't depend 
> on any XML library directly, and both child nodes and XML node 
> attributes appear as plain ol' object attributes in the tree. But the 
> Nexus module looked like the parser was kind of tied to the tree 
> representation, so I haven't reused any of that code yet. So #1 is my 
> preference, but it put the burden of inter-module compatibility on 
> whoever is maintaining Bio.Nexus, whereas #2 leaves my code on a quiet 
> little island for the rest of the summer.
> 
> All the best,
> Eric

-- 
====================================================
Nicholas J. Matzke
Ph.D. Candidate, Graduate Student Researcher
Huelsenbeck Lab
Center for Theoretical Evolutionary Genomics
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley

Lab websites:
http://ib.berkeley.edu/people/lab_detail.php?lab=54
http://fisher.berkeley.edu/cteg/hlab.html
Dept. personal page: 
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: matzke at berkeley.edu

Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140

-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong. When people 
thought the earth was spherical, they were wrong. But if you think that 
thinking the earth is spherical is just as wrong as thinking the earth 
is flat, then your view is wronger than both of them put together."

Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 
14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
====================================================



More information about the Biopython-dev mailing list