[Bioperl-l] SeqIO::genbank crash special case
jdiggans@genelogic.com
jdiggans@genelogic.com
Tue, 11 Dec 2001 17:30:27 -0500
I recently came across a horribly mis-formatted GenBank record on our local
copy that caused SeqIO::genbank to choke. I've fixed the problem in my
local copy but was wondering if bioperl has a policy for what to do in
bizarre use cases?
The problem appears here:
285 # to the last line read before returning
286 my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
287 # process ftunit
288 $ftunit->_generic_seqfeature($seq);
$ftunit is never tested to ensure it's defined before being used. In the
event something happens in _read_FTHelper_GenBank (my current issue) the
script ends up dying messily. I've patched mine to:
# to the last line read before returning
my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
# process ftunit - if there is a problem, warn and skip this FT unit
if( defined($ftunit) ) {
$ftunit->_generic_seqfeature($seq);
} else {
$self->warn("Unexpected feature error - FTUnit undefined,
skipping");
unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
$buffer = $self->_readline;
}
}
Is it worth adding some version of this to genbank.pm to allow a parse to
recover from a single poorly-formatted entry in a feature table? Or within
the bioperl mentality 'should' this kind of error be considered something
terminal?
This particular record happened to have an oddly-placed carriage return in
the middle of a feature range, completely confusing the
_read_FTHelper_GenBank routine and returning undef which then had a sub
called on it.
-j
-------------------------------------------------
James Diggans
Bioinformatics Programmer
Gene Logic, Inc.
Phone: 301.987.1756
FAX: 301.987.1701