[Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
Gabriel Valiente
valiente at lsi.upc.edu
Tue Dec 19 04:18:20 UTC 2006
Thanks a lot for the prompt answer and follow-up discussion. I think
this turned out not to be a bug in the merge_lineage() code but a
direct consequence of building a phylogenetic tree instead of a
taxonomic tree, aka with internal node labels.
In order to reconstruct the NCBI taxonomy for the set of species
present in a given phylogenetic tree, the only reasonable work-around
seems to be a first step of merging lineages and contracting linear
paths with the current implementation, followed by a second step of
restricting the given phylogenetic tree to the set of species present
in the obtained NCBI taxonomy. But this does not affect the code for
merge_lineage().
Gabriel
>>> I think you misunderstood me. The tree is fine; the data used to
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing
> to do with the structure of the database itself. I was just
> pointing out that NCBI Taxonomy isn't the best source of data for
> building a phylogenetic tree, something NCBI also points out (the
> link in my last post). Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his
>> tree. His
>> problem was that doing that happened to (correctly) remove the
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the
> literature in many cases. The problem is incomplete taxonomic
> information for nodes derived from older sequence data, where a
> genus and species was designated but nothing else (strain, etc) is
> known.
>
> Again, I merely was pointing out what I had mentioned above. I
> wasn't criticizing you, Gabriel, or the methodology here. Honest!
>
> chris
More information about the Bioperl-l
mailing list