[EMBOSS] seqret problem?
Peter Rice
pmr at ebi.ac.uk
Tue Jul 20 11:18:22 UTC 2004
simon andrews (BI) wrote:
> Having looked at a few definitions of the fasta format I can't find one
> which says that your sequence doesn't conform to the standard. The
> semi-colon can be used in Fasta as a comment delimiter, so it could be
> argued that it would be correct to remove the Z1BPC completely in the
> output, but it should still parse OK.
>
> The problem seems to be that seqret is interpreting your sequence as the
> PIR subvariant of the FastA format (Strange, since the test FastA
> sequence I just got from PIR didn't have a semi-colon in the description
> line) and its subsequent tests on the file then fail.
Fair enough. We do have to be careful about the automatic processing though.
What we can do is to test the PIR/NBRF format ">P1:seqid" first (which
we do), and then allow this as a valid FASTA format. I suspect earlier
versions of EMBOSS would allow this format - but not since we tried to
make sense of the NCBI's variants of FASTA format. At that point we
realised there were various ways to read bad data by assuming a PIR
first line is OK for FASTA format.
I need to check through the details to see where this happens.
We do have an alternative format "-sf pearson" which reads fasta files
with less checking, and I can change this format to allow "PIR" fasta id
lines through - but it is not tested by default because of the problems
with auto-detecting the wrong format.
> The problem seems to be that seqret is interpreting your sequence as the
> PIR subvariant of the FastA format (Strange, since the test FastA
> sequence I just got from PIR didn't have a semi-colon in the description
> line) and its subsequent tests on the file then fail.
>
Sadly, PIR is not a subformat of FASTA ... it was the format used for
the NBRF/PIR sequence analysis package. It has other odd features, not
all of which are strictly used by EMBOSS (because most can be safely
ignored :-)
regards,
Peter Rice
More information about the EMBOSS
mailing list