ACD file and emboss.default file syntax

Peter Rice pmr at ebi.ac.uk
Thu Feb 20 11:34:13 UTC 2003


I am cleaning up the parsing of both ACD files and the emboss.default 
files. This includes adding diagnostic messages to say what problems 
were found and to report the line number (and filename).

Showdb will carry out additional checks on the emboss.default and 
~/.embossrc files (valid sequence formats, for example). There is no 
need to run these every time the files are read.

At the same time, some of the syntax can be tightened. For example, ACD 
files allowed some strange characters that were never used (parentheses 
instead of quotes, "=" instead of ":"). These will be removed.

How far should this go? In particular, should white space be required 
after a ":" or around "[" and "]" characters?

There are also differences in the definitions of comments. In ACD files 
any text after a "#" is ignored. In emboss.default comments must start 
at the beginning of the line. This seems preferable as occasionally a 
"#" character could be useful in a definition.

For example, both of the following are valid ACD definitions:

#################################
# Full definition from acdpretty
#################################

integer: minlen  [
   required: "Y"
   minimum: "1"
   maximum: "50"
   default: "6"
   information: "Minimum length"
]

int:minlen [req:Y min:1 max:50 def:6 info:"Minimum length"] #compact

The first is preferred (and generated by -acdpretty). I would like to 
make it *required* so that other ACD parsers (e.g. for GUI definitions) 
can cope better.

The changes would be:

1. White space is required after "attribute:"

2. White space is required before and after "[" and "]"

3. Any "#" character at the start of a line is a comment and the line 
will be ignored. Any "#" within a line is part of the definition.


Extra questions are:

4. Should the ACD types (integer, string, ...) be specified in full? ACD 
can cope easily with unambiguous abbreviations, so I prefer to keep the 
short forms, but perhaps parsers have problem. These files are created 
by the developers so we can update them. One option is to generate 
warning messages, and to run acdpretty to fix them before committing the 
ACD files to CVS.

5. Should the emboss.default types (env, dbname) be specified in full? 
Parsing can cope easily with unambiguous abbreviations, and "db" in 
place of "dbname" is common. These files are created by the site 
administrators and by individual users, so we should avoid breaking 
their existing definitions. But note we have synonyms (env/set) so we 
could allow "db" or "dbname" as alternatives.

6. Should the ACD attribute names (required, information, ...) be 
abbreviated (see question 4)?

7. Should the database (and any other emboss.default) attribute names be 
abbreviated (see question 5)?


Peter Rice




More information about the emboss-dev mailing list