[Biopython] help with confidence values on PhyloXML tree objects?

Eric Talevich eric.talevich at gmail.com
Tue Dec 13 22:28:11 UTC 2011


On Tue, Dec 13, 2011 at 1:55 PM, Jon Sanders <jsanders at oeb.harvard.edu>wrote:

> Update: yup, seems to be a problem with numeric tip names.
>
> 1) getting rid of internal edge name doesn't help
> 2) appending 'a' to tip names fixes it
> 3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch
> lengths
> 4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03);
> loads the numeric tip names as confidence values and branch lenghts
> correctly
>
> I might try poking around the parser too, although my python foo has
> little bar.
>
> -j
>

All right, sounds like you've got it working then -- thanks for sharing
your investigations. Looking back at my own work, I see that when I used
numbered taxon labels earlier, I prefixed them with the letter "t"; I never
actually used plain integers, so I didn't hit this issue. For now, I
suppose your best bet is to temporarily add a letter prefix or suffix to
the names when transferring between PyCogent and Biopython.

According to the spec for the Nexus format (
http://www.ncbi.nlm.nih.gov/pubmed/11975335), which includes Newick,
all-numeric taxon names are illegal. So, I suppose the Biopython parser's
behavior is technically correct -- at least for parsing the tree section of
Nexus files.

In any case, this behavior is surprising and at least deserves a mention in
a docstring or an error message. I'll try to take a look at PyCogent's
parser to see how they handle ambiguous cases like the ones you listed.

-E


On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich <eric.talevich at gmail.com>wrote:
>
>> Hi Jon,
>>
>> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders <jsanders at oeb.harvard.edu>wrote:
>>
>>> So I have two problems.
>>>
>>>
>>> Problem 1: when importing my newick-formatted trees, which were generated
>>> in PyCogent, the terminal labels and branch labels are read in as
>>> confidence values because they're numerical. So
>>>
>>>    ((((41:0.01494,44:0.00014)0.604:0...
>>>
>>> is read in with blank name='' values and 41, 44, 0.605, etc. as
>>> 'confidence' values.
>>>
>>
>> Hmm, I'll take a look at the Newick parser. I think I've used numeric
>> taxon labels before without a problem, but PyCogent wasn't involved.
>>
>> It might work if you can coax PyCogent into writing the Newick files with
>> an extra colon:
>> ((((:41:0.01494,:44:0.00014):0.604:0...
>>
>>
>>
>>> Problem 2: I would like to store multiple confidence values per node,
>>> but I
>>> can't figure out how to do it.
>>>
>>> I can get the plain old 'confidence' attribute set by:
>>>
>>>   clade.confidence = .05
>>>
>>> but can't figure out how to add and set new confidence types. Any
>>> suggestions?
>>>
>>
>> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence
>> class.
>>
>> In PhyloXML trees, the attribute "clade.confidence" is actually a Python
>> property pointing to the first element of "clade.confidences", a list of
>> Confidence objects. It's syntax sugar to keep compatibility with Newick,
>> which just has a numeric value there.
>>
>> You can use it like this:
>>
>> from Bio.Phylo import PhyloXML
>>
>> # Create new Confidence instances
>> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap")
>> # The second argument is optional
>> a_posterior_probability = PhyloXML.Confidence(0.99)
>>
>> # Select a clade from your tree to modify
>> a_clade = mytree.clade[...]
>>
>> # Modify the list of Confidences directly
>> a_clade.confidences.append(a_bootstrap_value)
>> a_clade.confidences.append(a_posterior_probability)
>>
>>
>> If you've assigned multiple confidence values to a clade, using the
>> PhyloXML class, then the "clade.confidence" shortcut won't work anymore
>> because it's not clear which confidence you mean. So you'll have to use
>> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in
>> PhyloXML format to preserve the extra data.
>>
>> Hope that helps.
>>
>> Best regards,
>> Eric
>>
>
>



More information about the Biopython mailing list