[Bioperl-l] "Be forgiving in what you accept"andBio::Tools::GuessSeqFormat

Fri Jul 22 12:15:34 EDT 2005

If you specified that the file was FASTA, I'm not sure how the parser would
work for pulling out primary_id, display_id etc etc for the sequence - have
you check that the parser is flexible enough to pull these out of a sequence
description that has a space after the '>'?

It may be better to strip out these spaces prior to using them in bioperl?
But to be honest I wouldn't be bothered either way! :o)

Nathan

-----Original Message-----
From: George Hartzell [mailto:hartzell at kestrel.alerce.com] 
Sent: 22 July 2005 16:36
To: n.haigh at sheffield.ac.uk
Cc: 'Brian Osborne'; 'bioperl-l'
Subject: RE: [Bioperl-l] "Be forgiving in what you
accept"andBio::Tools::GuessSeqFormat

Nathan Haigh writes:
 > May I ask what software is producing this FASTA format file which has a
 > space immediately after the '>' in the description line?

I don't know what created it.  Wouldn't surprise me to find out it was
created in Microsoft Word....  It was given to me as a example input
file/test case.

 > Although I am not aware of a formal description of FASTA format, I have
 > never seem any files with a space immediately after '>'. Although I don't
 > object to relaxing this a little in bioperl, you may find that these
files
 > are not compatible with other software.

Yeah, there is that.  On the other hand, then we should make the
equivalent change and have the Bio::SeqIO object fail on them even if
it's told that they're Fasta (e.g. by -format or by guessing based on
filename).

I was just frustrated when stuff worked up until the moment that I
uploaded the file into a tool via the web (at which point it ended up
in an oddly named file and the guessing heuristic broke).

I'd vote for relaxing the constraint, but, hey....

g.