[Bioperl-l] "Be forgiving in what you accept" and
Bio::Tools::GuessSeqFormat
Brian Osborne
brian_osborne at cognia.com
Thu Jul 21 16:04:29 EDT 2005
George,
This does sound like a reasonable change, I will make it unless someone has
an objection. Let's wait a moment...
Brian O.
On 7/21/05 3:34 PM, "George Hartzell" <hartzell at kestrel.alerce.com> wrote:
>
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
>
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g. "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
>
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails. It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
>
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
>
> Something like this:
>
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
> my ($line, $lineno) = (shift, shift);
> return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> ! $line =~ /^>\w/);
> }
>
> --- 591,595 ----
> my ($line, $lineno) = (shift, shift);
> return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> ! $line =~ /^>\s*\w/);
> }
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list