[Biopython-dev] New Newick parser in Bio.Phylo

Eric Talevich eric.talevich at gmail.com
Mon Feb 11 04:20:20 UTC 2013


On Sun, Feb 10, 2013 at 11:04 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:
> >>
> >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com
> >
> >> wrote:
> >> > Hi Ben,
> >> >
> >> > I've noticed a couple new characteristics of the Newick parser that I
> >> > had
> >> > questions about.
> >> >
> >> > 1. There is no longer a way to tell the parser to treat internal node
> >> > labels
> >> > as confidence values. Lots of files in the wild do record the support
> >> > values
> >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes,
> >> > so
> >> > I'd like to restore this option, and perhaps make it the default. I
> >> > think
> >> > the condition is:
> >> >
> >> > if not (self.values_are_confidence or self.comments_are_confidence or
> >> > current_clade.is_terminal()): # parse confidence from node label
> >> >
> >> > Is there an easy way to add this option to the parser? I'm trying to
> get
> >> > this to work in the "else" clause in parse_tree, where unquoted node
> >> > labels
> >> > are handled.
> >> >
> >> >
> >> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> >> > support
> >> > values recorded as integers are treated as percentages and divided by
> >> > 100
> >> > automatically. The phyloXML spec doesn't have this range requirement.
> >> > RAxML
> >> > scales bootstraps to 100, but PhyML records the raw number of
> supporting
> >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> >> > replicates). So, I'd prefer to leave the confidence values as they
> are,
> >> > requiring only that they be numeric. Thoughts?
> >> >
> >> >
> >> > Thanks,
> >> > Eric
> >>
> >> 1. One issue is that current_clade.is_terminal() will always be true
> >> at that point because current_clade's children haven't been parsed
> >> yet. Putting the check in the "process_clade" function (which is
> >> called when the closing paren is hit, and therefore all children
> >> should have been parsed) should fix this.
> >>
> >> So, if values_are_confidence and comments_are_confidence are both
> >> false and a node label is numeric, it should be treated as confidence,
> >> and clade.name should be set to None - is that correct?
> >>
> >> 2. This should be as simple as removing current lines 123-127.
> >>
> >> ~Ben
> >
> >
> >
> > Thanks. Here's #2:
> >
> https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a
> >
> > I agree with your assessment of #1, but haven't been able to get it
> working
> > yet. I'm leaving Bug #3407 open for now:
> > https://redmine.open-bio.org/issues/3407
> >
>
> I think this should do it:
>
>
> https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63
>
> I also updated the test case to make sure this is working correctly
> and changed the default value of comments_are_confidences from True to
> False.
>
> If that looks correct, feel free to pull.
>
> ~Ben
>

Works for me, thanks! I cherry-picked it here:
https://github.com/biopython/biopython/commit/f382f550f49f73301663ad949a6c1e40f5d71c0c



More information about the Biopython-dev mailing list