[Bioperl-l] microbug in Bio::SeqIO::fasta::next_primary_seq

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Tue, 19 Jun 2001 09:00:45 -0400 (EDT)


On Mon, 18 Jun 2001, Karger, Amir wrote:

> >From what I can tell, the very first entry will not be '>'. It will be the
> entire first entry plus a '\n>'. Why? Because '>' doesn't match $/ = '\n>'.
> So this condition will never be true.

Except when your fasta-formatted file starts out with a blank line before
the very first entry:

----

>FirstSequence
ABADHGHRYH
>SecondSequence
ABDNASDADGJHASDH

----

I admit that the fasta parsing code could be more streamlined: here's the
code we always use in our scripts; as far as I know it handles all various
special cases people can dream up in FASTA formatted databases.

open(FASTA, "<db.fa") or die $!;
{ local $/ = "\n>";
  while(<FASTA>) {
    chomp;
    s/^\s*//s; # strip any leading whitespace
    my ($id, $desc, $seq) = m/^>?(\S+)\s*([^\n]*)\n(.*)$/s;
    next unless $seq;
    $seq =~ s/\W//sg;
  }
}
close(FASTA);

-Aaron