[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species

Sendu Bala bix at sendu.me.uk
Mon Dec 18 19:15:16 UTC 2006


Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
> 
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>> 
>> Gabriel
> 
> I can confirm that.  It is definitely dropping them in merge_lineage
>  (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.

I confirm the 'dropped' nodes, but also claim that this is no bug.

For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.

The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.

Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.


PS. I changed the merge_lineage() algorithm to be 18x faster (from the 
absurd 3mins for making the 190 species tree to a more reasonable 10s), 
without changing the tree produced.



More information about the Bioperl-l mailing list