Genetic codes and other repeated ACD lists

Dr J.C. Ison jison at hgmp.mrc.ac.uk
Fri Apr 8 10:34:51 UTC 2005


Hi Peter

Comments below.

Cheers

Jon



Peter Rice wrote:
> 
> I have found a way to save writing and maintaining lists like these in ACD files:
> 
>    list: table  [
>      additional: "Y"
>      default: "0"
>      minimum: "1"
>      maximum: "1"
>      header: "Genetic codes"
>      values: "0:Standard; 1:Standard (with alternative initiation
>               codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial;
>               4:Mold, Protozoan, Coelenterate Mitochondrial and
>               Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate
>               Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial;
>               10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear;
>               13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial;
>               15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial;
>               21:Trematode Mitochondrial; 22:Scenedesmus obliquus;
>               23:Thraustochytrium Mitochondrial"
>      delimiter: ";"
>      codedelimiter: ":"
>      information: "Code to use"
>      knowntype: "genetic code"
>    ]
> 
> Using the "knowntype" attribute it is possible to delet the value atttribute,
> and to define a standard list using a "resource" definition in the
> emboss.default (or .embossrc) file like this:
> 
> RESOURCE genetic_code [ type: "list" value: "0:Standard;11:Bacterial" ]
> 
> (for just 2 genetic codes)
> 
> or
> 
> RESOURCE genetic_code [ type: "list" value: "@EGC.index" ]
> 
> (for a list of all the genetic codes - this will read a datafile EGC.index
> which is new in CVS).
> 
> Other resource definitions could be commands to execute.

It'd be cleaner, more flexible and and easier to maintain and if not a 
requirement now probably an increasing one in the future.  I've two progs 
that would benefit from it now.

 
> I have not yet decided whether to allow a value of "@EGC.index" in the ACD
> file itself. It could be a nice short cut, but I like using a "knowntype" to
> control the results.

Could be confusing to allow that in the ACD file because the punter might 
think EGC existed, e.g. as a data item, in the file itself and get confused
when they can't find it.

 
> There are some problems to solve:
> 
> 1. the resource is tested in too many places - it should replace the "value"
> attribute when it is first used. Not hard to do.
> 
> 2. there should be a clean way to define a default value for each knowntype -
> for example calling an ajTrn function to resolve the "genetic code" knowntype
> to a value. Functions can be defined for list knowntypes in ajacd.c

Couldn't the default be specified in the same place / file as the values themselves?
Presumably the default value would be needed before run-time proper and could
be retrieved at the same time as the values are.

> 
> 3. anyone parsing the ACD file will wonder where the value has gone - perhaps
> acdpretty can be made to fill in missing values with an environment variable
> set. Would that be acceptable to those who need it?


I think it would be nice to support both "standard" lists (ie. ones *with* "values" 
attribute) and the new style.  Perhaps something like:

      values: "@knowntype"  

to indicate to use the knowntype to get the values, *or*

      values: "0: Standard ... etc" as before.

Then the values attribute would always be there, with the ACD developer having 
the option to specify a standard list of values or to get the values from the 
knowntype.


 
> Future uses for this:
> 
> 1. standard list of genetic codes with descriptions
> 
> 2. standard reading frame names
> 
> 3. list of known codon usage files, matrices, etc. by specifying "?" as the value
> 
> 4. a list of blast databases for a blastall wrapper :-)
> 
> 5. replacing "string" qualifiers which have a knowntype with a selection that
> can display and test the list of acceptable values in ACD, to avoid a run-time
> failure
> 
> Comments please ....
> 
> Peter

-- 
Jon C. Ison, PhD
Proteomics Applications Group
MRC Rosalind Franklin Centre for Genomics Research
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494500  Fax: +44 1223 494512
E-mail: jison at rfcgr.mrc.ac.uk  Web: http://www.rfcgr.mrc.ac.uk



More information about the emboss-dev mailing list