[Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names

Sendu Bala sb at mrc-dunn.cam.ac.uk
Fri May 12 10:24:39 UTC 2006


In bioperl up to at least 1.5.1, when one of the database modules comes 
across a species rank it does:

if ($rank eq 'species') {
   # get rid of genus from species name
   (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}

However even though true scientific name is usually 'Genus species' in 
the database, note the 'usually' - sometimes the species is a multiword 
item that does not include the Genus, so we can't do some simple split 
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis 
virus' is a variant of the species 'Avian leukosis virus' but 'Avian 
erythroblastosis virus (strain ES4)' is a variant of that variant...

My solution is to just remove whatever is the same between the current 
rank and the previous rank. Maybe even that's not so perfect, but it 
must be a lot better than turning the species 'Avian leukosis virus' 
into the species 'virus' (especially given that the genus here is 
'Alpharetrovirus')!

# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
   $sci_name =~ s/$last_raw//;
   $sci_name =~ s/^\s+//;
}

Are there even more strange species (and lower) names that would still 
not work well with the above solution?

Cheers,
Sendu.



More information about the Bioperl-l mailing list