[EMBOSS] seqret problem?

Peter Rice pmr at ebi.ac.uk
Tue Jul 20 11:18:22 UTC 2004


simon andrews (BI) wrote:

> Having looked at a few definitions of the fasta format I can't find one
> which says that your sequence doesn't conform to the standard.  The
> semi-colon can be used in Fasta as a comment delimiter, so it could be
> argued that it would be correct to remove the Z1BPC completely in the
> output, but it should still parse OK.
> 
> The problem seems to be that seqret is interpreting your sequence as the
> PIR subvariant of the FastA format (Strange, since the test FastA
> sequence I just got from PIR didn't have a semi-colon in the description
> line) and its subsequent tests on the file then fail.

Fair enough. We do have to be careful about the automatic processing though.

What we can do is to test the PIR/NBRF format ">P1:seqid" first (which 
we do), and then allow this as a valid FASTA format. I suspect earlier 
versions of EMBOSS would allow this format - but not since we tried to 
make sense of the NCBI's variants of FASTA format. At that point we 
realised there were various ways to read bad data by assuming a PIR 
first line is OK for FASTA format.

I need to check through the details to see where this happens.

We do have an alternative format "-sf pearson" which reads fasta files 
with less checking, and I can change this format to allow "PIR" fasta id 
lines through - but it is not tested by default because of the problems 
with auto-detecting the wrong format.

> The problem seems to be that seqret is interpreting your sequence as the
> PIR subvariant of the FastA format (Strange, since the test FastA
> sequence I just got from PIR didn't have a semi-colon in the description
> line) and its subsequent tests on the file then fail.
> 

Sadly, PIR is not a subformat of FASTA ... it was the format used for 
the NBRF/PIR sequence analysis package. It has other odd features, not 
all of which are strictly used by EMBOSS (because most can be safely 
ignored :-)


regards,

Peter Rice




More information about the EMBOSS mailing list