Genetic codes and other repeated ACD lists

Peter Rice pmr at ebi.ac.uk
Thu Apr 7 16:44:14 UTC 2005


I have found a way to save writing and maintaining lists like these in ACD files:

   list: table  [
     additional: "Y"
     default: "0"
     minimum: "1"
     maximum: "1"
     header: "Genetic codes"
     values: "0:Standard; 1:Standard (with alternative initiation
              codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
              4:Mold, Protozoan, Coelenterate Mitochondrial and
              Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
              Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
              10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
              13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
              15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
              21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
              23:Thraustochytrium Mitochondrial"
     delimiter: ";"
     codedelimiter: ":"
     information: "Code to use"
     knowntype: "genetic code"
   ]


Using the "knowntype" attribute it is possible to delet the value atttribute, 
and to define a standard list using a "resource" definition in the 
emboss.default (or .embossrc) file like this:

RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]

(for just 2 genetic codes)

or

RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]

(for a list of all the genetic codes - this will read a datafile EGC.index 
which is new in CVS).

Other resource definitions could be commands to execute.

I have not yet decided whether to allow a value of "@EGC.index" in the ACD 
file itself. It could be a nice short cut, but I like using a "knowntype" to 
control the results.

There are some problems to solve:

1. the resource is tested in too many places - it should replace the "value" 
attribute when it is first used. Not hard to do.

2. there should be a clean way to define a default value for each knowntype - 
for example calling an ajTrn function to resolve the "genetic code" knowntype 
to a value. Functions can be defined for list knowntypes in ajacd.c

3. anyone parsing the ACD file will wonder where the value has gone - perhaps 
acdpretty can be made to fill in missing values with an environment variable 
set. Would that be acceptable to those who need it?

Future uses for this:

1. standard list of genetic codes with descriptions

2. standard reading frame names

3. list of known codon usage files, matrices, etc. by specifying "?" as the value

4. a list of blast databases for a blastall wrapper :-)

5. replacing "string" qualifiers which have a knowntype with a selection that 
can display and test the list of acceptable values in ACD, to avoid a run-time 
failure

Comments please ....

Peter




More information about the emboss-dev mailing list