[Bioperl-l] "Be forgiving in what you accept"andBio::Tools::GuessSeqFormat

Nathan Haigh n.haigh at sheffield.ac.uk
Fri Jul 22 12:15:34 EDT 2005


If you specified that the file was FASTA, I'm not sure how the parser would
work for pulling out primary_id, display_id etc etc for the sequence - have
you check that the parser is flexible enough to pull these out of a sequence
description that has a space after the '>'?

It may be better to strip out these spaces prior to using them in bioperl?
But to be honest I wouldn't be bothered either way! :o)

Nathan

-----Original Message-----
From: George Hartzell [mailto:hartzell at kestrel.alerce.com] 
Sent: 22 July 2005 16:36
To: n.haigh at sheffield.ac.uk
Cc: 'Brian Osborne'; 'bioperl-l'
Subject: RE: [Bioperl-l] "Be forgiving in what you
accept"andBio::Tools::GuessSeqFormat


Nathan Haigh writes:
 > May I ask what software is producing this FASTA format file which has a
 > space immediately after the '>' in the description line?

I don't know what created it.  Wouldn't surprise me to find out it was
created in Microsoft Word....  It was given to me as a example input
file/test case.

 > Although I am not aware of a formal description of FASTA format, I have
 > never seem any files with a space immediately after '>'. Although I don't
 > object to relaxing this a little in bioperl, you may find that these
files
 > are not compatible with other software.

Yeah, there is that.  On the other hand, then we should make the
equivalent change and have the Bio::SeqIO object fail on them even if
it's told that they're Fasta (e.g. by -format or by guessing based on
filename).

I was just frustrated when stuff worked up until the moment that I
uploaded the file into a tool via the web (at which point it ended up
in an oddly named file and the guessing heuristic broke).

I'd vote for relaxing the constraint, but, hey....

g.




More information about the Bioperl-l mailing list