[Bioperl-l] validate_species_name
Wes Barris
wes.barris at csiro.au
Tue Sep 14 01:20:00 EDT 2004
I am converting back and forth between genbank and fasta formats.
The NCBI accession "AY374167" is one of many genbank entries whos
ORGANISM is flagged as invalid when creating genbank output.
The bioperl error thrown is this:
------------- EXCEPTION -------------
MSG: Invalid species name 'rosenbergii-Australia'
STACK Bio::Species::validate_species_name
/usr/lib/perl5/site_perl/5.8.0/Bio/Species.pm:321
STACK Bio::Species::classification /usr/lib/perl5/site_perl/5.8.0/Bio/Species.pm:151
STACK toplevel /home/wes/proj/genbank/fastatogenbank.pl:29
--------------------------------------
The first few lines of the genbank file that I am trying to
match are:
LOCUS AY374167 867 bp DNA linear INV 31-OCT-2003
DEFINITION Macrobrachium rosenbergii-Australia 18S ribosomal RNA gene, partial
sequence.
ACCESSION AY374167
VERSION AY374167.1 GI:37675510
KEYWORDS .
SOURCE Macrobrachium rosenbergii-Australia
ORGANISM Macrobrachium rosenbergii-Australia
Eukaryota; Metazoa; Arthropoda; Crustacea; Malacostraca;
Eumalacostraca; Eucarida; Decapoda; Pleocyemata; Caridea;
Palaemonoidea; Palaemonidae; Macrobrachium.
The relevant piece of bioperl code (Species.pm) is:
sub validate_species_name {
my( $self, $string ) = @_;
return 1 if $string eq "sp.";
return 1 if $string =~ /^[a-z][\w\s]+$/i;
$self->throw("Invalid species name '$string'");
}
I believe that a '-' could be added to the string test like this:
return 1 if $string =~ /^[a-z][\w\s-]+$/i;
bioperl-live (as of today), redhat 8.
--
Wes Barris
E-Mail: Wes.Barris at csiro.au
More information about the Bioperl-l
mailing list