[Bioperl-l] Bio::DB::Taxonomy root not present

Chris Fields cjfields at illinois.edu
Thu Jul 7 19:30:36 UTC 2011


I agree I view this as a bug just based on the principle of least surprise (I would expect the data to follow NCBI's w/o have any underlying changes).  Do you want to work on that? Might be interesting to see if anything else breaks...

chris

On Jul 7, 2011, at 2:22 PM, Brian Osborne wrote:

> All,
> 
> It's true that "root" is "fake" or non-existent. It's also true that having 5 trees instead of 1 is incorrect scientifically and awkward programmatically. Perhaps the most salient point is that "root" exists as a page in the NCBI Taxonomy and as an entry in the *dmp files, as Bernd says, and it has an tax id of 1. So if the goal is to be faithful to NCBI Taxonomy then it should be restored.
> 
> Brian O.
> 
> On Jul 7, 2011, at 12:40 PM, Chris Fields wrote:
> 
>> Okay, to reanswer in a more definitive way, this appears to have been added by Sendu in relation to these bug reports:
>> 
>> https://redmine.open-bio.org/issues/2061
>> https://redmine.open-bio.org/issues/2047
>> 
>> The main one is bug 2061, where this is present:
>> 
>> Bio::DB::Taxonomy::flatfile
>> ---------------------------
>> 
>> • API-CHANGES
>> get_Children_Taxids is deprecated - method no longer part of the DB::Taxonomy interface, and superseded by each_Descendent (which is actually implemented by all databases).
>> • Implementation changes
>> No longer includes the fake root node 'root'; there are multiple roots now (10239, 12884, 12908, 29384 and 131567). This means when getting the lineage you no longer have to remove the root node. This is now consistent with the results possible with entrez. 
>> NB: You have to delete your current indexes before you will notice the change.
>> 
>> chris
>> 
>> On Jul 7, 2011, at 10:16 AM, Brian Osborne wrote:
>> 
>>> Bernd,
>>> 
>>> Yes, good question. Currently if you want to traverse up the tree from any given node you have to be aware that the tree may end at "cellular organisms" or "other sequences" or "unclassified sequences" or "Viruses" or "Viroids" but not at "root", this can make for awkward programming.
>>> 
>>> Brian O.
>>> 
>>> On Jul 7, 2011, at 11:03 AM, Bernd Web wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I noticed Bio::DB::Taxonomy does not contain the root of the tree,
>>>> while the NCBI node file does.
>>>> For example, the lineage "root; cellular organisms; Bacteria" stops at
>>>> "cellular organisms", which means there is no parent node of
>>>> "cellular organisms". (see code below).  Also
>>>> $taxdb->get_Taxonomy_Node(1) would not return the Bio::Taxon object
>>>> for root  (using BioPerl 1.6.9).
>>>> 
>>>> Was there a reason not to include the node "root" in the index files
>>>> for Bio::DB::Taxonomy?
>>>> 
>>>> 
>>>> Kind regards,
>>>> Bernd
>>>> 
>>>> use strict;
>>>> use File::Spec;
>>>> use Bio::DB::Taxonomy;
>>>> 
>>>> my $prefix = '/scratch/taxonomy/';
>>>> my $taxdb = Bio::DB::Taxonomy->new
>>>> (-source => 'flatfile',
>>>>  -directory => File::Spec->catfile($prefix,'idx'),
>>>>  -nodesfile => File::Spec->catfile($prefix,'nodes.dmp'),
>>>>  -namesfile => File::Spec->catfile($prefix,'names.dmp')
>>>>  );
>>>> 
>>>> 
>>>> my $taxid = '2';
>>>> my $node = $taxdb->get_Taxonomy_Node($taxid);
>>>> $node = $taxdb->ancestor($node);
>>>> print $node->node_name, "\n"; #prints: cellular organisms
>>>> $node = $taxdb->ancestor($node);
>>>> print $node->node_name, "\n"; #error :Can't call method "node_name" on
>>>> an undefined value at taxdb.pl line...
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list