[Bioperl-l] Bio::DB::Taxonomy:: mishandles species, subspecies/variant names
Sendu Bala
sb at mrc-dunn.cam.ac.uk
Fri May 12 10:24:39 UTC 2006
In bioperl up to at least 1.5.1, when one of the database modules comes
across a species rank it does:
if ($rank eq 'species') {
# get rid of genus from species name
(undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}
However even though true scientific name is usually 'Genus species' in
the database, note the 'usually' - sometimes the species is a multiword
item that does not include the Genus, so we can't do some simple split
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis
virus' is a variant of the species 'Avian leukosis virus' but 'Avian
erythroblastosis virus (strain ES4)' is a variant of that variant...
My solution is to just remove whatever is the same between the current
rank and the previous rank. Maybe even that's not so perfect, but it
must be a lot better than turning the species 'Avian leukosis virus'
into the species 'virus' (especially given that the genus here is
'Alpharetrovirus')!
# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
$sci_name =~ s/$last_raw//;
$sci_name =~ s/^\s+//;
}
Are there even more strange species (and lower) names that would still
not work well with the above solution?
Cheers,
Sendu.
More information about the Bioperl-l
mailing list