[Bioperl-l] Empty FASTA files with Bio::SeqIO

Hilmar Lapp lapp@gnf.org
Wed, 20 Dec 2000 16:03:10 -0800


"J.C. Diggans" wrote:
> 
> I went ahead and patched my local version from 0.6.2 (patch below). It
> was a quick fix, can anyone think of a case in which this would fail?
> 
> - jc
> 
> 122,123c122,134
> <   my ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s
> <     or $self->throw("Can't parse entry");
> ---
> >   # Check for empty sequences and handle gracefully
> >   my ($top,$sequence);
> >   if( $entry =~ /^(.+?)\n([^>]+)/s ) {
> >       # There is valid sequence present
> >       ($top,$sequence) = $entry =~ /^(.+?)\n([^>]+)/s
> >            or $self->throw("Can't parse entry");
> >   } else {
> >       # There is no sequence present,
> >       $top = $entry =~ /^(.+?)\n/
> >            or $self->throw("Can't parse entry"); # save top
> >       $sequence = ""; # set sequence to empty string
> >   }
> >
> 

The correctly FASTA-formatted empty seq ought to have an empty line after
the '>'-line. I think we should check for that, just to be sure we're not
misinterpreting something.

Second, Bio::Seq currently won't let you define an empty seq. This needs to
be fixed, too.

If your fix works for you, that's fine. 0.7 will still take a while anyway,
unless someone donates a fuzzy-location full coverage package for
christmas.

	Hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------