[Biopython-dev] New Newick parser in Bio.Phylo

Eric Talevich eric.talevich at gmail.com
Mon Feb 11 03:30:47 UTC 2013


On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi Ben,
> >
> > I've noticed a couple new characteristics of the Newick parser that I had
> > questions about.
> >
> > 1. There is no longer a way to tell the parser to treat internal node
> labels
> > as confidence values. Lots of files in the wild do record the support
> values
> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> > I'd like to restore this option, and perhaps make it the default. I think
> > the condition is:
> >
> > if not (self.values_are_confidence or self.comments_are_confidence or
> > current_clade.is_terminal()): # parse confidence from node label
> >
> > Is there an easy way to add this option to the parser? I'm trying to get
> > this to work in the "else" clause in parse_tree, where unquoted node
> labels
> > are handled.
> >
> >
> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> support
> > values recorded as integers are treated as percentages and divided by 100
> > automatically. The phyloXML spec doesn't have this range requirement.
> RAxML
> > scales bootstraps to 100, but PhyML records the raw number of supporting
> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> > replicates). So, I'd prefer to leave the confidence values as they are,
> > requiring only that they be numeric. Thoughts?
> >
> >
> > Thanks,
> > Eric
>
> 1. One issue is that current_clade.is_terminal() will always be true
> at that point because current_clade's children haven't been parsed
> yet. Putting the check in the "process_clade" function (which is
> called when the closing paren is hit, and therefore all children
> should have been parsed) should fix this.
>
> So, if values_are_confidence and comments_are_confidence are both
> false and a node label is numeric, it should be treated as confidence,
> and clade.name should be set to None - is that correct?
>
> 2. This should be as simple as removing current lines 123-127.
>
> ~Ben
>


Thanks. Here's #2:
https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a

I agree with your assessment of #1, but haven't been able to get it working
yet. I'm leaving Bug #3407 open for now:
https://redmine.open-bio.org/issues/3407



More information about the Biopython-dev mailing list