[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
Chris Fields
cjfields at uiuc.edu
Mon Dec 18 23:14:23 UTC 2006
On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now. I personally would consider this a bug since the
>> results are unexpected (so the docs need to be modified in order
>> to clarify). Some say tomato...
>> I suppose this is one of the issues one might run into when using
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.
I think you misunderstood me. The tree is fine; the data used to
make the tree (NCBI taxonomy) is the issue. One of the clear caveats
that NCBI attaches to their taxonomy data is that should not be the
'primary source for taxonomic or phylogenetic information':
http://tinyurl.com/y3k624
I think it works as a good guide as long as one takes the above into
consideration. That and the fact that not all taxids attached to
sequence data will represent leaf nodes.
chris
More information about the Bioperl-l
mailing list