[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
Chris Fields
cjfields at uiuc.edu
Mon Dec 18 20:55:55 UTC 2006
On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that. It is definitely dropping them in merge_lineage
>> (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue
> arises
> in all cases.
Makes sense now. I personally would consider this a bug since the
results are unexpected (so the docs need to be modified in order to
clarify). Some say tomato...
I suppose this is one of the issues one might run into when using
NCBI taxonomy to build trees.
> Gabriel, please confirm this isn't a bug, or suggest how you
> propose to
> see your taxa when they are not all leaves of the tree.
Having the nodes appear internally seems semantically correct to me.
Is there any other way?
> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable
> 10s),
> without changing the tree produced.
Definitely an improvement!
chris
More information about the Bioperl-l
mailing list