[Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sat Apr 10 04:10:39 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3045





------- Comment #5 from eric.talevich at gmail.com  2010-04-10 00:10 EST -------
(In reply to comment #4, myself)
> (In reply to comment #0, Joel)
> > (1) internal nodes, terminal nodes, and all nodes are not currently
> > on an equal footing with respect to methods
> 
> We could also have 'get_nonterminal' and 'get_all_clades' -- I'm not so sure
> that the last one is useful enough to justify cluttering the API further; what
> do you think? (I actually balked at add get_terminals() originally, since it's
> so simple.)

I added get_nonterminals() to TreeMixin:
http://github.com/biopython/biopython/commit/de024f7d700a8ce83a64bc9f8cfd6273cefe95bc

Do we need a get_all_clades method? Is that a good name?


> > Here I give some convenience methods that I wish were defined in 
> > TreeMixin.  I have tested them as standalone methods.  I hope you'll
> > see fit to include them at some point.
> > 
> > def count_internals(self):
> >     """Counts the number of non-terminal (internal) nodes within this tree."""
> >     return [i for i,e in enumerate_internals(self)][-1] + 1
> 
> I can add a convenience function that would help:
> 
> def iterlen(items):
>     for i, x in enumerate(items):
>         count = i
>     return count + 1
> 
> Then count_internals(tree) is the same as:
> iterlen(tree.find_clades(terminal=False))
> 
> Or, if we add get_nonterminals() it's easy:
> len(tree.get_nonterminals())

Both of these can be done now, but len(tree.get_nonterminals()) is easiest.

iterlen() is hidden in _sugar.py for now:
http://github.com/biopython/biopython/commit/c8ce7f7b0314b54084b62759b1f82488374cae28


> > Less critical but still useful are the following two methods (and one private
> > utility) that I find useful for operations on trees:
> > 
> > def is_semipreterminal(self):
> >     """True if any direct descendent is terminal."""
> >     if self.root.is_terminal():
> >         return False
> >     for clade in self.clades:
> >         if clade.is_terminal():
> >             return True
> >         return False
> 
> Is semipreterminal a standard name for nodes like this?
> 
> In Python 2.5 and later, you could also do:
> any(clade.is_terminal() for clade in self)
> 
> 
> > def terminal_neighbor_dists(self):
> >     """Return a list of distances between adjacent terminals"""
> >     return [self.distance(*i) for i in
> > _generate_pairs(self.find_clades(terminal=True))]
> > 
> > def _generate_pairs(self):
> >     import itertools
> >     pairs = itertools.tee(self)
> >     pairs[1].next()
> >     return itertools.izip(pairs[0], pairs[1])

I'll add these to the wiki as cookbook entries.

One more thing -- should we rename the find_all and find_clades methods? I'm
leaving this bug open as a reminder to decide that (and the get_all_clades
question above) before the 1.54 release.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list