[EMBOSS] CODON USAGE TABLES
pmr at ebi.ac.uk
Fri Apr 1 13:50:52 UTC 2005
Guy Bottu wrote:
> As formats, it would of course be nice if EMBOSS programs could read and
> write codon usage tables (and other data) in any format, just as they do
> for sequences. Which formats should we support besides what EMBOSS uses
> now ? Is there such a thing as "native" CUTG format (with one entry a
> file) ?. I know about GCG format (not useful for us, but other people
> certainly might want it). There is Staden format. Staden format supports
> also files with 2 tables (codon usage in genes + trinucleotide frequency
> in noncoding DNA) ; what to do with this ? only read the first ? There is
> also the format used by CODEHOP
CODEHOP format is minimal, but can be used. It appears to be derived from
CUTG's "spsum" files (which I will also add as a format).
Other formats I know about (and will include):
codonusage database ftp://ftp.ebi.ac.uk/pub/databases/codonusage
transterm database ftp://ftp.ebi.ac.uk/pub/databases/transterm
GCG (with extra header comments to contain species and other information) does
anyone have example from GCG or from other sources that write "GCG format"
files so we can convert U -> T and any other non-standard data.
CUTG website format
SPSUM format (CUTG database .spsum files)
CODEHOP format http://blocks.fhcrc.org/blocks/codehop.html
Staden format: I have no example for this apart from one in the Staden
src/seq_utils/genetics_codes.c source file - can someone send examples please?
I would be happy reading an optional second file for some formats, although
EMBOSS does not currently use the data the Staden format has.
More information about the EMBOSS