[Bioperl-l] microbug in Bio::SeqIO::fasta::next_primary_seq
   
    Aaron J Mackey
     
    Aaron J. Mackey" <amackey@virginia.edu
       
    Tue, 19 Jun 2001 09:00:45 -0400 (EDT)
    
    
  
On Mon, 18 Jun 2001, Karger, Amir wrote:
> >From what I can tell, the very first entry will not be '>'. It will be the
> entire first entry plus a '\n>'. Why? Because '>' doesn't match $/ = '\n>'.
> So this condition will never be true.
Except when your fasta-formatted file starts out with a blank line before
the very first entry:
----
>FirstSequence
ABADHGHRYH
>SecondSequence
ABDNASDADGJHASDH
----
I admit that the fasta parsing code could be more streamlined: here's the
code we always use in our scripts; as far as I know it handles all various
special cases people can dream up in FASTA formatted databases.
open(FASTA, "<db.fa") or die $!;
{ local $/ = "\n>";
  while(<FASTA>) {
    chomp;
    s/^\s*//s; # strip any leading whitespace
    my ($id, $desc, $seq) = m/^>?(\S+)\s*([^\n]*)\n(.*)$/s;
    next unless $seq;
    $seq =~ s/\W//sg;
  }
}
close(FASTA);
-Aaron