[Biopython-dev] Bug in majority_tree method

Zheng Ruan zruan1991 at gmail.com
Thu Oct 8 21:09:48 UTC 2015


Hi,

I found an interesting bug in the majority_tree method in
Bio.Phylo.Consensus. I faked 100 copy of identical trees and tried to get a
majority tree from it. Ideally, each clade should get 100% support from it
but it's not.

Here is the tree in newick format:
(((PKA:0.35924,(Jak1:0.17907,((Jak3:0.13427,Jak2:0.12242)Inner9:0.04444,Tyk2:0.18579)Inner25:0.01158)Inner27:0.03625)Inner34:0.02803,((((ALK:0.18162,Ros1:0.21324)Inner29:0.03348,(Fes1:0.02371,Fes:0.05953)Inner6:0.11662)Inner35:0.02558,((CSK:0.13093,((((Src_human:0.05170,cSrc:0.07533)Inner10:0.02095,Fyn:0.07764)Inner12:0.05531,(Lyn:0.07969,(Lck:0.08430,Hck:0.06321)Inner13:0.01252)Inner14:0.03035)Inner22:0.04465,(Abl2:0.03155,(Abl_kd:0.00322,Abl:0.02300)Inner3:0.03272)Inner5:0.12051)Inner32:0.02247)Inner36:0.00510,((EphA2_4TRL_kd:0.09656,EphA8:0.08396)Inner23:0.00536,(((EphA3_kd:0.02015,EphA3:0.03792)Inner11:0.04315,(EphA5:0.05872,(EphA7:0.13218,EphA4:0.03996)Inner17:0.01306)Inner18:0.00437)Inner20:0.01845,(EphB1:0.07597,EphB4:0.08463)Inner19:0.01918)Inner21:0.02246)Inner24:0.06661)Inner37:0.01022)Inner39:0.00499,((Met:0.19148,(IGF1R:0.07154,InsR:0.05387)Inner2:0.13270)Inner33:0.02574,((Ret_kd:0.02239,Ret:0.01824)Inner1:0.16109,(((FGFR1:0.06807,(FGFR3:0.01935,FGFR2:0.02793)Inner4:0.02533)Inner8:0.03839,FGFR4:0.12332)Inner15:0.04988,(VEGFR2:0.17881,(Kit:0.11441,CSF1R:0.11666)Inner16:0.02523)Inner26:0.02255)Inner30:0.01750)Inner31:0.02993)Inner38:0.01137)Inner40:0.01646)Inner41:0.01551,(Ack1:0.21078,PTK2:0.15535)Inner42:0.00960,((ZAP70:0.15548,Syk:0.14399)Inner28:0.04516,(Her4:0.08558,EGFR_kd:0.08301)Inner7:0.10528)Inner43:0.00188)Inner44:0.00000;

Here is what I tried:

>>> from Bio import Phylo
>>> from Bio.Phylo.Consensus import *
## bootstrap.nwk contains 100 copy of the above tree
>>> trees = list(Phylo.parse('bootstrap.nwk', 'newick'))
>>> majority_tree = majority_consensus(trees)
>>> print majority_tree

Tree(rooted=True)
    Clade()
        Clade(branch_length=0.00831, confidence=27.0)
            Clade(branch_length=0.0138077142857, confidence=35.0)
                Clade(branch_length=0.11159969697, confidence=33.0)
                    Clade(branch_length=0.02371, name='Fes1')
                    Clade(branch_length=0.0292794871795, confidence=39.0)
                        Clade(branch_length=0.05953, name='Fes')
                        Clade(branch_length=0.13093, name='CSK')
                Clade(branch_length=0.0105683333333, confidence=12.0)

...omit..
                    Clade(branch_length=0.033740952381, confidence=42.0)
                        Clade(branch_length=0.09656, name='EphA2_4TRL_kd')
                        Clade(branch_length=0.08396, name='EphA8')
                Clade(branch_length=0.0400708, confidence=25.0)
                    Clade(branch_length=0.05872, name='EphA5')
                    Clade(branch_length=0.0630573333333, confidence=45.0)
                        Clade(branch_length=0.13218, name='EphA7')
                        Clade(branch_length=0.03996, name='EphA4')


The confidence for most of the clades are less than 50%. Any ideas what
happens?

Thanks!
Ruan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20151008/f8538b6a/attachment.html>


More information about the Biopython-dev mailing list