[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

Peter Rice ricepeterm at yahoo.co.uk
Sat Oct 13 07:40:15 UTC 2012


On 12/10/2012 22:27, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> Hello everyone,
>
> We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a list file that looks like "db:<space character>seqname", EMBOSS doesn't issue an error/warning message, but treats this entry as "db:*". >
>
> Might it be possible though to add some protection against potentially problematic consequences if such an error in the USA is made? In one such instance the resultant clustalw process ended up attempting to build a multiple alignment across the entire UniProt, which the server didn't handle well :-)

An interesting problem. List files have a long history, going back 
before EMBOSS. They were also used in the GCG (Wisconsin) package, which 
in turn adopted them from the VMS operating system. where they could be 
used for mailing lists (sending to @list with a list of usernames, for 
example).

In a list file, only the first token (word) is significant. The 
remainder of the line is treated as a comment.

As you discovered, a space before the id (or indeed just a database 
name) is a valid input representing all entries in the database.

I think it is safe to assume that list files in practice have no 
comments, so we can make a simple change for the next release:

list:: indicates a list file with only one token per line. Any 
extraneous text will result in an error or warning message

The same restriction will be applied to the VMS syntax @listfile

A new list style can be added to allow comments so that any user with 
them can still use their list files.

Possibly a stricter comment style could be allowed in standard list:: 
files. We can check what other packages may have introduced, but 
something like a perl-style #comment could be simple to add. The # 
character has no special meaning in the EMBOSS query language.

With those changes in place your users would be saved from extra spaces 
... but of course would still be caught by a newline creeping in to 
start a new record after the database name (reading the entire database, 
then reading the id as a possible filename). Users will get an error 
message from that so long as the second part is not a valid filename or 
database name.

regards,

Peter Rice
EMBOSS Team





More information about the EMBOSS mailing list