[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species

Sendu Bala bix at sendu.me.uk
Mon Dec 18 23:27:15 UTC 2006


Chris Fields wrote:
> 
> On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>>> For example, the first 'drop' happens for the 101st species which is
>>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>>>> longer a leaf, so the overall number of leaves does not increase.
>>>
>>> Makes sense now.  I personally would consider this a bug since the 
>>> results are unexpected (so the docs need to be modified in order to 
>>> clarify).  Some say tomato...
>>> I suppose this is one of the issues one might run into when using 
>>> NCBI taxonomy to build trees.
>>
>> No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
>> deliberately then does:
>>
>> # simple paths are contracted by removing degree one nodes
>> $tree->contract_linear_paths;
>>
>> Because that is what Gabriel's script originally did.
> 
> I think you misunderstood me.  The tree is fine; the data used to make 
> the tree (NCBI taxonomy) is the issue.

In what way is it the issue? The database is also fine as far as I can 
see, in so far as it is not causing any problems in this instance.

Gabriel asked for a tree featuring a species and its subspecies. The 
NCBI taxonomy database provided Bioperl the correct data to build such a 
tree. Then Gabriel asked to remove the degree one nodes of his tree. His 
problem was that doing that happened to (correctly) remove the species 
node. If he wants to see both his species and his subspecies he must 
either not remove degree one nodes, or alter the method of doing so to 
keep desired nodes. There is no possible way for NCBI to improve matters 
here.




More information about the Bioperl-l mailing list