[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
Sendu Bala
bix at sendu.me.uk
Mon Dec 18 19:15:16 UTC 2006
Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>>
>> Gabriel
>
> I can confirm that. It is definitely dropping them in merge_lineage
> (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.
I confirm the 'dropped' nodes, but also claim that this is no bug.
For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.
The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.
Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.
PS. I changed the merge_lineage() algorithm to be 18x faster (from the
absurd 3mins for making the 190 species tree to a more reasonable 10s),
without changing the tree produced.
More information about the Bioperl-l
mailing list