[Bioperl-l] validate_species_name

Wes Barris wes.barris at csiro.au
Tue Sep 14 01:20:00 EDT 2004


I am converting back and forth between genbank and fasta formats.
The NCBI accession "AY374167" is one of many genbank entries whos
ORGANISM is flagged as invalid when creating genbank output.
The bioperl error thrown is this:

------------- EXCEPTION  -------------
MSG: Invalid species name 'rosenbergii-Australia'
STACK Bio::Species::validate_species_name 
/usr/lib/perl5/site_perl/5.8.0/Bio/Species.pm:321
STACK Bio::Species::classification /usr/lib/perl5/site_perl/5.8.0/Bio/Species.pm:151
STACK toplevel /home/wes/proj/genbank/fastatogenbank.pl:29

--------------------------------------

The first few lines of the genbank file that I am trying to
match are:

LOCUS       AY374167                 867 bp    DNA     linear   INV 31-OCT-2003
DEFINITION  Macrobrachium rosenbergii-Australia 18S ribosomal RNA gene, partial
             sequence.
ACCESSION   AY374167
VERSION     AY374167.1  GI:37675510
KEYWORDS    .
SOURCE      Macrobrachium rosenbergii-Australia
   ORGANISM  Macrobrachium rosenbergii-Australia
             Eukaryota; Metazoa; Arthropoda; Crustacea; Malacostraca;
             Eumalacostraca; Eucarida; Decapoda; Pleocyemata; Caridea;
             Palaemonoidea; Palaemonidae; Macrobrachium.

The relevant piece of bioperl code (Species.pm) is:

sub validate_species_name {
     my( $self, $string ) = @_;

     return 1 if $string eq "sp.";
     return 1 if $string =~ /^[a-z][\w\s]+$/i;
     $self->throw("Invalid species name '$string'");
}

I believe that a '-' could be added to the string test like this:

     return 1 if $string =~ /^[a-z][\w\s-]+$/i;

bioperl-live (as of today), redhat 8.
-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au



More information about the Bioperl-l mailing list