[EMBOSS] CAI Tables
Peter Rice
pmr at ebi.ac.uk
Tue Dec 20 10:18:17 UTC 2005
Kevin Brown wrote:
> I've searched the archives and the Net trying to find more information
> the list of highly-expressed genes that CAI used to create the codon
> tables for the various species. Does anyone know where these tables
> came from?
Well, it is a long story ...
The original CAI (Codon Adaptation Index) was calculated for S. cerevisiae and
used a set of 24 genes (ribosomal proteins, for example) that were known to be
highly expressed.
Many years ago when I wrote a program to calculate the codon usage for S.
pombe (fission yeast) in a program called "codfish", I created a table for the
few S. pombe genes from the same set that had already been sequenced.
On arriving at the Sanger Centre, where they were sequencing a lot of S.
cerevisiae, I needed the codon usage table for the original CAI - and found
that when I used the current gene sequences I got the wrong answer.
After some tweaking I was able to reconstruct the original versions of the
cerevisiae sequences and could reproduce the "standard" CAI values. The
differences were minor - mainly missed short 5' exons.
The tables Eyeast_cai.cod and Eschpo_cai.cut are the result of these two
tables. There are, for historic reasons, copies of these tables with no
headers and different names which are "obsolete" and will disappear in EMBOSS
4.0.0.
As for "codfish" ... it implemented an algorithm from Frank Wright to
calculate the effective number of codons. Frank is a vegan (does not eat
codfish), so in EMBOSS we renamed it "chips" :-)
Hope that helps!
Peter
More information about the EMBOSS
mailing list