[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior
Peter Rice
ricepeterm at yahoo.co.uk
Sat Oct 13 07:40:15 UTC 2012
On 12/10/2012 22:27, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> Hello everyone,
>
> We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a list file that looks like "db:<space character>seqname", EMBOSS doesn't issue an error/warning message, but treats this entry as "db:*". >
>
> Might it be possible though to add some protection against potentially problematic consequences if such an error in the USA is made? In one such instance the resultant clustalw process ended up attempting to build a multiple alignment across the entire UniProt, which the server didn't handle well :-)
An interesting problem. List files have a long history, going back
before EMBOSS. They were also used in the GCG (Wisconsin) package, which
in turn adopted them from the VMS operating system. where they could be
used for mailing lists (sending to @list with a list of usernames, for
example).
In a list file, only the first token (word) is significant. The
remainder of the line is treated as a comment.
As you discovered, a space before the id (or indeed just a database
name) is a valid input representing all entries in the database.
I think it is safe to assume that list files in practice have no
comments, so we can make a simple change for the next release:
list:: indicates a list file with only one token per line. Any
extraneous text will result in an error or warning message
The same restriction will be applied to the VMS syntax @listfile
A new list style can be added to allow comments so that any user with
them can still use their list files.
Possibly a stricter comment style could be allowed in standard list::
files. We can check what other packages may have introduced, but
something like a perl-style #comment could be simple to add. The #
character has no special meaning in the EMBOSS query language.
With those changes in place your users would be saved from extra spaces
... but of course would still be caught by a newline creeping in to
start a new record after the database name (reading the entire database,
then reading the id as a possible filename). Users will get an error
message from that so long as the second part is not a valid filename or
database name.
regards,
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list