[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species

Chris Fields cjfields at uiuc.edu
Mon Dec 18 23:14:23 UTC 2006


On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'  
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now.  I personally would consider this a bug since the  
>> results are unexpected (so the docs need to be modified in order  
>> to clarify).  Some say tomato...
>> I suppose this is one of the issues one might run into when using  
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl  
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.

I think you misunderstood me.  The tree is fine; the data used to  
make the tree (NCBI taxonomy) is the issue.  One of the clear caveats  
that NCBI attaches to their taxonomy data is that should not be the  
'primary source for taxonomic or phylogenetic information':

http://tinyurl.com/y3k624

I think it works as a good guide as long as one takes the above into  
consideration.  That and the fact that not all taxids attached to  
sequence data will represent leaf nodes.

chris




More information about the Bioperl-l mailing list