[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species

Gabriel Valiente valiente at lsi.upc.edu
Tue Dec 19 04:18:20 UTC 2006


Thanks a lot for the prompt answer and follow-up discussion. I think  
this turned out not to be a bug in the merge_lineage() code but a  
direct consequence of building a phylogenetic tree instead of a  
taxonomic tree, aka with internal node labels.

In order to reconstruct the NCBI taxonomy for the set of species  
present in a given phylogenetic tree, the only reasonable work-around  
seems to be a first step of merging lineages and contracting linear  
paths with the current implementation, followed by a second step of  
restricting the given phylogenetic tree to the set of species present  
in the obtained NCBI taxonomy. But this does not affect the code for  
merge_lineage().

Gabriel

>>> I think you misunderstood me.  The tree is fine; the data used to  
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I  
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing  
> to do with the structure of the database itself.  I was just  
> pointing out that NCBI Taxonomy isn't the best source of data for  
> building a phylogenetic tree, something NCBI also points out (the  
> link in my last post).  Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build  
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his  
>> tree. His
>> problem was that doing that happened to (correctly) remove the  
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing  
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve  
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the  
> literature in many cases.  The problem is incomplete taxonomic  
> information for nodes derived from older sequence data, where a  
> genus and species was designated but nothing else (strain, etc) is  
> known.
>
> Again, I merely was pointing out what I had mentioned above.  I  
> wasn't criticizing you, Gabriel, or the methodology here.  Honest!
>
> chris




More information about the Bioperl-l mailing list