[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Tue Jul 18 07:27:49 UTC 2006


Hilmar Lapp wrote:
> I don't think we should differ from NCBI in places where the  
> connection between a method name and the NCBI data file is obvious or  
> otherwise we will confuse people and send them into traps.
> 
> $node->scientific_name() should simply report what NCBI reports. For  
> simple species this will be identical to what $node->binomial()  
> returns, but for others it may not, e.g., strains, varieties, etc or  
> the weird world of viri and bacteria.

Ok, well this certainly seems to be consensus so I'll abide.


> This will also absolve us from retaining the business logic for how  
> to construct the scientific name from genus, species, and possibly  
> strain or whatever.

What about the existing genus(), species(), sub_species() and variant() 
methods? There would be no need for any logic to join things together, 
but I would still like to be able to get just 'sapiens' from somewhere. 
Can I use species() for that purpose (though again, species is strictly 
'Homo sapiens')? Likewise sub_species() and variant() could hold the 
remaining non-redundant names. Or should all of these be deprecated 
because they don't really have a place in a generic Node class?

What about node_name()? Yet another synonym of scientific_name? (right 
now it grabs the common name(s)). Ugh.

What should I do with the classification array? Should it hold the raw 
ScientificName like:
join(',', $node->classification) eq 'Homo sapiens, Homo, 
Homo/Pan/Gorilla group [...]'?
Or should it be like:
join(',', $node->classification) eq 'sapiens, Homo, Homo/Pan/Gorilla 
group [...]'?

The latter is how it currently works (when it works correctly); I would 
rather fix it than lose the logic completely, but if we're staying true 
to proper classification (vs. what a programmer might expect), I guess I 
must use the raw ScientificName?


> binomial() isn't part of the NCBI taxonomy definition, so you have  
> freedom there to report what suits you.

I don't think binomial() would serve any useful purpose now, however. I 
can either deprecate it or make it a synonym of scientific_name() or 
both. Or binomial() can be a version of scientific_name() that complains 
if you use it on a rank higher or lower than species. As for species() 
et al., it may have no place in a generic Node class. Thoughts?



More information about the Bioperl-l mailing list