[Bioperl-l] "Be forgiving in what you accept" andBio::Tools::GuessSeqFormat

Nathan Haigh n.haigh at sheffield.ac.uk
Fri Jul 22 05:05:51 EDT 2005


May I ask what software is producing this FASTA format file which has a
space immediately after the '>' in the description line?

Although I am not aware of a formal description of FASTA format, I have
never seem any files with a space immediately after '>'. Although I don't
object to relaxing this a little in bioperl, you may find that these files
are not compatible with other software.

Nathan

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Brian Osborne
Sent: 21 July 2005 21:04
To: hartzell at alerce.com; bioperl-l
Subject: Re: [Bioperl-l] "Be forgiving in what you accept"
andBio::Tools::GuessSeqFormat

George,

This does sound like a reasonable change, I will make it unless someone has
an objection. Let's wait a moment...

Brian O.


On 7/21/05 3:34 PM, "George Hartzell" <hartzell at kestrel.alerce.com> wrote:

> 
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
> 
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g.  "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
> 
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
> 
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
> 
> Something like this:
> 
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig
Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu
Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\w/);
>   }
>   
> --- 591,595 ----
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\s*\w/);
>   }
>   
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list