Peter Rice pmr at ebi.ac.uk
Fri Apr 1 13:50:52 UTC 2005

Guy Bottu wrote:

> As formats, it would of course be nice if EMBOSS programs could read and 
> write codon usage tables (and other data) in any format, just as they do 
> for sequences. Which formats should we support besides what EMBOSS uses 
> now ? Is there such a thing as "native" CUTG format (with one entry a 
> file) ?. I know about GCG format (not useful for us, but other people 
> certainly might want it). There is Staden format. Staden format supports 
> also files with 2 tables (codon usage in genes + trinucleotide frequency 
> in noncoding DNA) ; what to do with this ? only read the first ? There is 
> also the format used by CODEHOP 
> (http://blocks.fhcrc.org/blocks/codehop.html).

CODEHOP format is minimal, but can be used. It appears to be derived from 
CUTG's "spsum" files (which I will also add as a format).

Other formats I know about (and will include):

codonusage database ftp://ftp.ebi.ac.uk/pub/databases/codonusage

transterm database ftp://ftp.ebi.ac.uk/pub/databases/transterm

GCG (with extra header comments to contain species and other information) does 
anyone have example from GCG or from other sources that write "GCG format" 
files so we can convert U -> T and any other non-standard data.

CUTG website format

SPSUM format (CUTG database .spsum files)

CODEHOP format http://blocks.fhcrc.org/blocks/codehop.html

Staden format: I have no example for this apart from one in the Staden 
src/seq_utils/genetics_codes.c source file - can someone send examples please? 
I would be happy reading an optional second file for some formats, although 
EMBOSS does not currently use the data the Staden format has.


Peter Rice

More information about the EMBOSS mailing list