ACD file and emboss.default file syntax
Peter Rice
pmr at ebi.ac.uk
Thu Feb 20 11:34:13 UTC 2003
I am cleaning up the parsing of both ACD files and the emboss.default
files. This includes adding diagnostic messages to say what problems
were found and to report the line number (and filename).
Showdb will carry out additional checks on the emboss.default and
~/.embossrc files (valid sequence formats, for example). There is no
need to run these every time the files are read.
At the same time, some of the syntax can be tightened. For example, ACD
files allowed some strange characters that were never used (parentheses
instead of quotes, "=" instead of ":"). These will be removed.
How far should this go? In particular, should white space be required
after a ":" or around "[" and "]" characters?
There are also differences in the definitions of comments. In ACD files
any text after a "#" is ignored. In emboss.default comments must start
at the beginning of the line. This seems preferable as occasionally a
"#" character could be useful in a definition.
For example, both of the following are valid ACD definitions:
#################################
# Full definition from acdpretty
#################################
integer: minlen [
required: "Y"
minimum: "1"
maximum: "50"
default: "6"
information: "Minimum length"
]
int:minlen [req:Y min:1 max:50 def:6 info:"Minimum length"] #compact
The first is preferred (and generated by -acdpretty). I would like to
make it *required* so that other ACD parsers (e.g. for GUI definitions)
can cope better.
The changes would be:
1. White space is required after "attribute:"
2. White space is required before and after "[" and "]"
3. Any "#" character at the start of a line is a comment and the line
will be ignored. Any "#" within a line is part of the definition.
Extra questions are:
4. Should the ACD types (integer, string, ...) be specified in full? ACD
can cope easily with unambiguous abbreviations, so I prefer to keep the
short forms, but perhaps parsers have problem. These files are created
by the developers so we can update them. One option is to generate
warning messages, and to run acdpretty to fix them before committing the
ACD files to CVS.
5. Should the emboss.default types (env, dbname) be specified in full?
Parsing can cope easily with unambiguous abbreviations, and "db" in
place of "dbname" is common. These files are created by the site
administrators and by individual users, so we should avoid breaking
their existing definitions. But note we have synonyms (env/set) so we
could allow "db" or "dbname" as alternatives.
6. Should the ACD attribute names (required, information, ...) be
abbreviated (see question 4)?
7. Should the database (and any other emboss.default) attribute names be
abbreviated (see question 5)?
Peter Rice
More information about the emboss-dev
mailing list