codon useage tables
Peter Rice
peter.rice at uk.lionbioscience.com
Tue Nov 26 10:40:04 UTC 2002
Michael Poidinger wrote:
> Do you (or anyone else) know the difference between related files?
>
> such as
> Ehum and Ehuman
> Eeco, Eeco_h and Eecoli
> Emus, Emussp
The codon usage files were set up a long time ago. It was not so easy to
find a good set of tables that were free to use. The first tables (if I
recall correctly) came from the TRANSTERM database
Short names (Eeco) are reformatted TRANSTERM codon usage tables with an E
(EMBOSS) prefix and a .cut suffix to identify the format.
Names with _h (Eco_h) are highly expressed genes (high Codon Adaptation
Index values)
sp endings? Help! Ysp is "Yeast S.pombe" of course. I assume the others are
for a genus (e.g. Mus sp. = Mus musculus and Mus domesticus) rather than
a single species. Emussp.cut is a reformat of TRANSTERM's mussp.cod file.
The EBI's FTP copy of TRANSTERM did not document exactly what these names
mean. The original TRANSTERM documentation also leaves you to guess at the
3-letter spoecies codes. The TRANSTERM website seems to be only partly
available.
Longer names (Eecoli) are added from elsewhere (I need to check on their
origin) and only include a few genes (count the stop codons!) so I assume
they are old and probably obsolete.
mt endings are mitochondrial genes
cp endings are chloroplast genes
Time to review these tables I suspect!!! How about replacing them with
annotated tables from CUTG for selected species? We need to be careful
about default table names in some programs, but they are easy to update.
Peter
--
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723
More information about the EMBOSS
mailing list