ACD file and emboss.default file syntax

Peter Ernst p.ernst at dkfz-heidelberg.de
Mon Feb 24 11:27:34 UTC 2003


On Thu, 20 Feb 2003, Peter Rice wrote:

> How far should this go? In particular, should white space be required
> after a ":" or around "[" and "]" characters?

How significant are newlines?

Usually a newline is just a whitespace in ACD and can be treated like
a space character. The end of a block is defined by the ']'
character. However there are the qualifiers "variable" and
"endsection" where the end of the block is defined by a newline
character.

A simple approach to solve this problem was to say: "a newline marks
the end of a block unless one of the opening characters like '['
appears on the same line". However this doesn't work with all existing
ACD files.

  "good" definition:

 appl: hmmgen [
  documentation: "G.."
  ...
 ]

  "bad" definition: (found in DOMAINATRIX/.../hmmgen.acd)

 appl: hmmgen
[
  documentation: "G.."
 ...
]




> There are also differences in the definitions of comments. In ACD files
> any text after a "#" is ignored. In emboss.default comments must start
> at the beginning of the line. This seems preferable as occasionally a
> "#" character could be useful in a definition.

Yes it would be better if comments must start from the beginning of
a line. However existing parsers contain code to deal with the other
comments as well. But anyway, a change in the syntax definition for
comments makes sense (for future versions of parsers in GUIs).



> 4. Should the ACD types (integer, string, ...) be specified in full? ACD
> can cope easily with unambiguous abbreviations, [...]
> [...]
>
> 6. Should the ACD attribute names (required, information, ...) be
> abbreviated (see question 4)?

The problem is, that abbreviations used in existing ACD files are not
*globally* unambiguous but only *locally* unambiguous, i.e.

  the abbreviation MAX
  is used for MAXSEQS (in ALIGNMENT context),
          for MAXIMUM (e.g. INTEGER context) and
          for MAXLENGTH (e.g. STRING context).

Using a LEX/YACC approach to parse ACD files, it was problematic to
create a simple lexer, because whenever the lexer found "max", it
wasn't clear if the token MAXSEQS, MAXIMUM or MAXLENGTH was
meant. (The lexer had to know its context, to be able to throw the
right token.)

Therefore more unambiguity would be welcome (even if this means: no
abbreviations in ACD files).


Regards,
	Peter Ernst





More information about the emboss-dev mailing list