[Bioperl-l] Bio::Taxonomy confusion

Sendu Bala sb at mrc-dunn.cam.ac.uk
Wed May 10 09:30:59 UTC 2006


Hi,
I'm a little confused as to how names are supposed to work in 
Bio::Taxonomy::Node.

In the bioperl versions that I've looked at a Node doesn't seem to store 
the most important information about itself - it's scientific name - in 
an obvious place. bioperl 1.5.1 puts it at the start of the 
classification list. I'd have thought sticking it in -name would make 
more sense, but this is used only for the GenBank common name.

The Bio::Taxonomy docs still suggests:

my $node_species_sapiens = Bio::Taxonomy::Node->new(
   -object_id => 9606, # or -ncbi_taxid. Requird tag
   -names => {
       'scientific' => ['sapiens'],
       'common_name' => ['human']
   },
   -rank => 'species'  # Required tag
);

and whilst Bio::Taxonomy::Node does not accept -names, it does have a 
'name' method which claims to work like:

$obj->name('scientific', 'sapiens');

This kind of thing would be really nice, but afaics 
Bio::Taxonomy::Node->new takes the -name value and makes a common name 
out of it, whilst the name() method passes any 'scientific' name to the 
scientific_name() method which is unable to set any value (and warns 
about this), only get.

It seems like the need to have this classification array work the same 
way as Bio::Species is causing some unnecessary restrictions. Can't the 
more sensible idea of having a dedicated storage spot for the 
ScientificName and other parameters be used, with the classification 
array either being generated just-in-time from the hash-stored data, or 
indeed being generated from the Lineage field?


Also, why does a node store the complete hierarchy on itself in the 
classification array? If we're going that far, why don't the 
Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a 
get_taxonomy() method instead of a get_Taxonomy_Node() method. 
get_taxonomy() could, from a single efetch.fcgi lookup, create a 
complete Bio::Taxonomy with all the nodes. Whilst most nodes would only 
have a minimum of information, if you could simply ask a node what its 
rank and scientific name was you could easily build a classification 
array, or ask what Kingdom your species was in etc.

Are there good reasons for Taxonomy working the way it does in 1.5.1, or 
would I not be wasting my time re-writing things to make more sense (to me)?


Cheers,
Sendu.



More information about the Bioperl-l mailing list