[Bioperl-l] "Be forgiving in what you accept" and
Bio::Tools::GuessSeqFormat
George Hartzell
hartzell at kestrel.alerce.com
Thu Jul 21 15:34:19 EDT 2005
There's a great "old" Internet maxim, "Be forgiving in what you accept
and strict in what you send".
The Bio::Seqio modules seem to be able to cope with "fasta" formatted
files that have a space separating the ">" from the rest of the line
(e.g. "> ape") if a) you explicitly specify the format or b) if you
have the sequence in a file that ends in "fa" (or generally matches
the list of patterns that correspond to fasta file names).
But, if you happen to have the sequence in a file with a funny name
(e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It
can't guess based on the filename and the file content test is strict
and wants to see the header line without the whitespace (">ape").
Is there any reason not to extend the regexp a bit and relax that
constraint (since everything else seems to cope with it)?
Something like this:
*** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu Jul 21 12:30:55 2005
--- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul 21 12:31:45 2005
***************
*** 591,595 ****
my ($line, $lineno) = (shift, shift);
return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
! $line =~ /^>\w/);
}
--- 591,595 ----
my ($line, $lineno) = (shift, shift);
return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
! $line =~ /^>\s*\w/);
}
g.
More information about the Bioperl-l
mailing list