codon useage tables

Peter Rice peter.rice at uk.lionbioscience.com
Tue Nov 26 10:40:04 UTC 2002


Michael Poidinger wrote:

> Do you (or anyone else) know the difference between related files?
> 
> such as
> Ehum and Ehuman
> Eeco, Eeco_h and Eecoli
> Emus, Emussp

The codon usage files were set up a long time ago. It was not so easy to 
find a good set of tables that were free to use. The first tables (if I 
recall correctly) came from the TRANSTERM database

Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E 
(EMBOSS) prefix and a .cut suffix to identify the format.

Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation 
Index values)

sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are 
  for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than 
a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file. 
The EBI's FTP copy of TRANSTERM did not document exactly what these names 
mean. The original TRANSTERM documentation also leaves you to guess at the 
3-letter spoecies codes. The TRANSTERM website seems to be only partly 
available.

Longer names (Eecoli) are added from elsewhere (I need to check on their 
origin) and only include a few genes (count the stop codons!) so I assume 
they are old and probably obsolete.

mt endings are mitochondrial genes

cp endings are chloroplast genes

Time to review these tables I suspect!!! How about replacing them with 
annotated tables from CUTG for selected species? We need to be careful 
about default table names in some programs, but they are easy to update.

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723




More information about the EMBOSS mailing list