[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Mon Jul 17 16:31:37 UTC 2006


I see strange node names via Bio::DB::Taxonomy::flatfile:

use Bio::DB::Taxonomy;

my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -directory => 
$taxonomy_dir, -nodesfile => $taxonomy_dir.'nodes.dmp', -namesfile => 
$taxonomy_dir.'names.dmp');

my $tax_id = 89593;
my $node = $db->get_Taxonomy_Node($tax_id);

print "node $tax_id has name '", @{$node->name('common')}, "' and rank 
'", $node->rank, "'\n";

Results in:
node 89593 has name 'Craniata <chordata>' and rank 'subphylum'

Other examples:
node 2 has name 'Bacteria <bacteria>' and rank 'superkingdom'
node 1386 has name 'Bacillus <bacterium>' and rank 'genus'
node 7776 has name 'Gnathostomata <vertebrate>' and rank 'superclass'
etc.

For me the bits in <> are inappropriate and shouldn't be there. The NCBI 
website agrees, and you won't see these things if you use -source => 
'entrez'. Should they be removed by the flatfile parser as a matter of 
course, with no warnings or option? Or do people want them? Typically 
they are just the name of the parent node, so I don't see why anyone 
would /need/ them, and I argue it's invalid for parent node information 
to be duplicated here.

If there are no objections I'll strip the <> bits. I also plan to make 
$node->name('scientific', 'sapiens'); set and get the node name, and 
have flatfile and entrez store all common names with 
$obj->name('common', 'human', 'man');. As these changes will make the 
implementation match the docs I don't see any problems, except that 
flatfile users will now find the node name in a different place 
(@{$node->name('scientific')} instead of @{$node->name('common')}).

I'll also fix the problem with node names for ranks species and lower, 
as discussed in thread 'Bio::DB::Taxonomy:: mishandles species, 
subspecies/variant names', in the way I suggested there.

If anyone can see a problem with any of these changes, let me know asap.



More information about the Bioperl-l mailing list