[Bioperl-l] How to Handle Parse Errors
dmcwilli
dmcwilli at utk.edu
Fri Jul 4 10:28:44 EDT 2003
There was a question like this in May, I think, but I have been unable
to find help for this in the FAQ or recent postings.
I am trying to parse GenBank records and find those which have the
Feature /region_name="Transit peptide". I did a broad Entrez search
and downloaded the results, so I'm accessing the file locally. The
parser fails and exits the script prematurely when it encounters a record
with the Feature "Het" with the message:
-------------------- WARNING ---------------------
MSG: exception while parsing location line
[join(bond(201),bond(203),bond(204),bond(204),bond(204),bond(204))] in
reading EMBL/GenBank/SwissProt, ignoring feature Het (seqid=8RUC_G):
------------- EXCEPTION -------------
MSG: operator "bond" unrecognized by parser STACK
Bio::Factory::FTLocationFactory::from_string
/usr/lib/perl5/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm:160
STACK Bio::Factory::FTLocationFactory::from_string
/usr/lib/perl5/site_perl/5.8.0/Bio/Factory/FTLocationFactory.pm:157
STACK (eval) /usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:124
STACK Bio::SeqIO::FTHelper::_generic_seqfeature
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/FTHelper.pm:123 STACK
Bio::SeqIO::genbank::next_seq
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/genbank.pm:396 STACK toplevel
./biopl5.pl:20
--------------------------------------
---------------------------------------------------
Can't call method "primary_tag" on an undefined value at
/usr/lib/perl5/site_perl/5.8.0/Bio/SeqIO/genbank.pm line 400, <GEN0>
line 23630.
# end of message
My code is:
#!/usr/bin/perl
#
# tpfilter.pl
# Get transit peptides from files in genbank format. Uses BioPerl
# David R. McWilliams dmcwilli at utk.edu
# 04-Jul-03
use strict;
use warnings ;
use Bio::SeqIO;
use Bio::Seq;
my $file = shift @ARGV;
my $in = new Bio::SeqIO(-format => 'genbank', -file => $file);
my $datetime = scalar(localtime()) ;
print "# Output of $0 on $file.\n" ;
print "# $datetime\n" ;
my $fnd = 0 ;
while( my $seq = $in-> next_seq) {
foreach my $feature ( $seq->get_SeqFeatures ) {
if($feature->primary_tag eq 'Region' ) {
if( $feature->has_tag('region_name') ) {
my ($tag) = $feature->get_tag_values('region_name') ;
if( $tag =~ /transit|signal/i ) {
$fnd++ ;
print ">", $seq->display_id(), "|",
"tp=", $feature->start, "\.\.", $feature->end, "|",
$seq->species->binomial(), "|",
$seq->description(), "\n";
print $seq->subseq($feature->start, $feature->end), "\n" ;
}
}
}
}
}
print "# Found $fnd seqs w/ tp.\n" ;
# end code
If I remove the offending records by hand, this works fine. So, is
there a way to continue to parse the offending records, even though
the parser does not recognize this particular feature, or is there a
way to catch the error and skip the record without aborting the rest
of the script?
Regards,
More information about the Bioperl-l
mailing list