Peter Rice pmr at ebi.ac.uk
Tue Dec 20 10:18:17 UTC 2005

Kevin Brown wrote:

> I've searched the archives and the Net trying to find more information
> the list of highly-expressed genes that CAI used to create the codon
> tables for the various species.  Does anyone know where these tables
> came from?

Well, it is a long story ...

The original CAI (Codon Adaptation Index) was calculated for S. cerevisiae and 
used a set of 24 genes (ribosomal proteins, for example) that were known to be 
highly expressed.

Many years ago when I wrote a program to calculate the codon usage for S. 
pombe (fission yeast) in a program called "codfish", I created a table for the 
few S. pombe genes from the same set that had already been sequenced.

On arriving at the Sanger Centre, where they were sequencing a lot of S. 
cerevisiae, I needed the codon usage table for the original CAI - and found 
that when I used the current gene sequences I got the wrong answer.

After some tweaking I was able to reconstruct the original versions of the 
cerevisiae sequences and could reproduce the "standard" CAI values. The 
differences were minor - mainly missed short 5' exons.

The tables Eyeast_cai.cod and Eschpo_cai.cut are the result of these two 
tables. There are, for historic reasons, copies of these tables with no 
headers and different names which are "obsolete" and will disappear in EMBOSS 

As for "codfish" ... it implemented an algorithm from Frank Wright to 
calculate the effective number of codons. Frank is a vegan (does not eat 
codfish), so in EMBOSS we renamed it "chips" :-)

Hope that helps!


More information about the EMBOSS mailing list