[EMBOSS] FASTA format appears to get misrecognised as GCG

Jan Kim jttkim at googlemail.com
Wed Mar 11 19:15:41 UTC 2015

Dear All,

I've just had "water" in EMBOSS fail, and traced this back
to the regular expression "CHECK: [0-9].*\.\." matching the header
line of a FASTA file. The command

    water -asequence b.fasta  [...]  -auto

terminates with

    Warning: Sequence 'gcg::b.fasta:broken' has zero length, ignored
    Error: Unable to read sequence 'b.fasta'

As a minimal demo, any sequence with the header

    >broken CHECK: 0 ..

causes the problem, and expressly stating the format (via "fasta::b.fasta"
rather than just "b.fasta") fixes it.

My speculation at this point is that somehow matching the regexp mentioned
above causes the autodetection to identify the format as GCG rather than

This doesn't exactly match my expectations based on the USA specs [1],
according to which EMBOSS expects FASTA by default and will try other
formats only if that doesn't work. (I have some inkling that this
default can be configured somewhere, but I haven't found anything
suspicious in /usr/local/share/EMBOSS and a quick scan didn't turn up
any stray .embossrc files either.)

As a bit of background, this happened in an "embedded script", and the
regexp was right in the sense that stuff from a GCG (or similar) formatted
file had found its way into the FASTA header. I hope I fixed my script
now by expressly stating the format; this posting is to solicit comments
regarding whether I've done something wrong / stupid (and possibly to
leave some hints regarding this matter in the mailing list archives...).

Best regards, Jan

[1] http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

More information about the EMBOSS mailing list