[Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names

Sendu Bala sb at mrc-dunn.cam.ac.uk
Mon May 15 18:00:15 UTC 2006


Chris Fields wrote:
> 
> Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
> deal with it.  I also noticed that subspecies also contains the entire
> string:
> 
>     <Taxon>
>       <TaxId>135461</TaxId>
>       <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
>       <Rank>subspecies</Rank>
>     </Taxon>

Yes, this is one of the problems I mentioned in the first post to this
thread.


> As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
> I don't get the actual scientific name for the node (from the GenBank
> ORGANISM line) almost every time; I get the name with the strain chopped off
> instead and a number of times the names get mangled.

[snip, should be:]
> 224308  Bacillus subtilis subsp. subtilis str. 168
> 281309  Bacillus thuringiensis serovar konkukian str. 97-27

[snip, but Bio::DB::Taxonomy gives:]
> 224308  subtilis Bacillus subtilis subsp. subtilis
> 281309  Bacillus cereus group thuringiensis

[snip]
> So, in a nutshell, there's a problem here.  I don't know if your fix works
> for that, but I definitely don't think the 'scientific name' should be
> assembled ad hoc but should be taken from the tagname for that node.

Yes, my implementation will get you the correct answer, but not quite as
you say. My solution was to munge the actual ScientificName but 'ensure'
that the binomial would give you back the actual binomial name you
wanted - which is the intent of current Bio::DB::Taxonomy code.

my $species0 = TFBS::Species->new(-ncbi_taxid => 224308);
my $leaf_node = $species0->taxonomy->get_leaves();
print "sci_name of Node = '", $leaf_node->scientific_name, "'\n";
print "Species0 subspecies = '", $species0->subspecies, "'\n";
print "Species0 variants = '", scalar($species0->variant), "'\n";
print "Species0 binomial = '", $species0->binomial('FULL'), "'\n";

gives:
sci_name of Node = 'str. 168'
Species0 subspecies = 'subsp. subtilis'
Species0 variants = 'str. 168'
Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168'

and the same again for id 281309:

sci_name of Node = 'str. 97-27'
Species0 subspecies = ''
Species0 variants = 'serovar konkukian str. 97-27'
Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27'

I've done it this way because even though strictly speaking the
ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp.
subtilis str. 168', when I ask for the variant I don't want that whole
string. I just want the bit that will be different when comparing other
strains of this subspecies of this species of Bacillus. I want 'str.
168'. Note that my objects never store the original ScientificName; it
is due to 'luck' (or as I like to think, a good implementation) that the
binomial method is able to reconstruct a string that is identical to
what the original ScientificName was.

If you'd like to see my code let me know. You can't just drop the code
snippet I posted in this thread into existing bioperl modules; quite a
bit else has to change as well. I'll have to make an updated
taxonomy_the_tfbs_way.tar.gz file available if you want an example
implementation; the current version of that file is now out of date - it
doesn't do any of what I describe above.




More information about the Bioperl-l mailing list