[Bioperl-l] microbug in Bio::SeqIO::fasta::next_primary_seq
Aaron J Mackey
Aaron J. Mackey" <amackey@virginia.edu
Tue, 19 Jun 2001 09:00:45 -0400 (EDT)
On Mon, 18 Jun 2001, Karger, Amir wrote:
> >From what I can tell, the very first entry will not be '>'. It will be the
> entire first entry plus a '\n>'. Why? Because '>' doesn't match $/ = '\n>'.
> So this condition will never be true.
Except when your fasta-formatted file starts out with a blank line before
the very first entry:
----
>FirstSequence
ABADHGHRYH
>SecondSequence
ABDNASDADGJHASDH
----
I admit that the fasta parsing code could be more streamlined: here's the
code we always use in our scripts; as far as I know it handles all various
special cases people can dream up in FASTA formatted databases.
open(FASTA, "<db.fa") or die $!;
{ local $/ = "\n>";
while(<FASTA>) {
chomp;
s/^\s*//s; # strip any leading whitespace
my ($id, $desc, $seq) = m/^>?(\S+)\s*([^\n]*)\n(.*)$/s;
next unless $seq;
$seq =~ s/\W//sg;
}
}
close(FASTA);
-Aaron