[Bioperl-l] Bio::SeqIO::swiss species parsing bug?
David Gonzalez
gonzaled at tcd.ie
Fri Aug 17 17:03:35 UTC 2007
Hi,
I had a problem with a swissprot file in which the genus and species
were being left undefined, and I believe it could be a bug in the
swiss.pm module.
When I tried to parse the file with Bio::SeqIO, I got the following
error messages:
Use of uninitialized value in pattern match (m//) at
/sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 965, <GEN0> line 12.
Use of uninitialized value in string eq at
/sw/lib/perl5/5.8.6/Bio/SeqIO/swiss.pm line 967, <GEN0> line 12.
The fields I wanted from the file (gene_id , etc.. ) were fine however,
so it was being parsed.
I checked the output with Data::Dumper and I found the following in the
species entry; the species is left undefined, and the common name is absent.
'species' => bless( {
'_ncbi_taxid' => 'Not',
'_classification' => [
undef,
undef,
'Aedes',
'Culicini',
'Culicinae',
'Culicidae',
'Culicoidea',
'Nematocera',
'Diptera',
'Endopterygota',
'Neoptera',
'Pterygota',
'Insecta',
'Hexapoda',
'Arthropoda',
'Metazoa',
'Eukaryota'
]
}, 'Bio::Species' ),
The species line in the file is formatted according to the swissprot
specifications and includes a common name
OS Aedes aegypti (yellow fever mosquito)
OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera;
OC Endopterygota; Diptera; Nematocera; Culicoidea; Culicidae; Culicinae;
OC Culicini; Aedes.
OX NCBI_TaxID=Not defined;
I think the problem is in the line 905 of the swiss.pm file:
902 if(/^OS\s+(\S.+)/ && (! defined($binomial))) {
903 $osline .= " " if $osline;
904 $osline .= $1;
905 if($osline =~ s/(,|, and|\.)$//) {
906 ($binomial, $descr) = $osline =~ /(\S[^\(]+)(.*)/;
907 ($ns_name) = $binomial;
908 $ns_name =~ s/\s+$//; #####
The problem seems to be that there are no punctuation signs, so 905
returns false. The swissprot format does not require the line to end in
'.' I think although it normally does. By just removing the requirement
for the substitution the output of Data::Dumper seemed normal
....
'_common_name' => 'yellow fever mosquito',
'_ncbi_taxid' => 'Not',
'_classification' => [
'aegypti',
'Aedes',
'Culicini',
....
I am using the fink installed bioperl:
bioperl-pm586 1.4-5 Perl module for biology
I don't know if this has been reported/solved in the newer versions of
bioperl.
David
--
David Gonzalez Knowles
Smurfit Institute of Genetics
Trinity College
Dublin
More information about the Bioperl-l
mailing list