[EMBOSS] question on 'codcmp'

Derek Gatherer d.gatherer at vir.gla.ac.uk
Thu Nov 20 17:01:36 UTC 2003


Hi

This is a very interesting question.  I don't think that there is any way 
to say if it is statistically significant just from looking at it, as it is 
essentially a descriptive statistic about the difference between two 64-mer 
(or is it 59-mer without stop, trp and met) vectors.  If you have a whole 
lot of sequences and codcmp results for all the possible pairwise 
comparisons, then the resulting distance matrix can be used to build a 
phylogenetic tree based on codon usage.

However, if you generate a series of random sequences, measure their codon 
usage and then do codcmp between each of your test sequences and all the 
random sequences, you could then use a z-test to see if the result between 
the two test sequences was outside of the top or bottom 5%.

This would assume that the codcmp results were normally distributed, but 
you could test that too, either by plotting it or using an F-test.  You 
could use shuffle to base your random sequences on the test sequences - so 
that would ensure the randomised background had the same nucleotide content.

Cheers
Derek

At 09:59 20/11/2003 -0600, Makedonka Dautova wrote:
>Hi,
>
>I have a question regarding the output of codcmp (Codon usage table
>comparison) and was hopping that you can help me.
>
>After running the program we received the following output file:
>
>---------------------------------------------------------------------------
># CODCMP codon usage table comparison
># anc_can.FULLSPECIES.cuspoutput vs anc_cey.FULLSPECIES.cuspoutput
>
>Sum Squared Difference = 0.036
>Mean Squared Difference = 0.001
>Root Mean Squared Difference = 0.024
>Sum Difference         = 1.038
>Mean Difference         = 0.016
>Codons not appearing   = 0
>---------------------------------------------------------------------------
>
>How do I interpret this? How do I know if the difference is statistically
>significant or not?
>
>Thank you in advance,
>Makedonka Mitreva.
>
>**************************************
>Makedonka Dautova Mitreva, Ph.D.
>
>Project Leader, Parasitic Nematode ESTs
>Genome Sequencing Center,
>Department of Genetics,
>Washington University School of Medicine,
>Box 8501
>4444 Forest Park Boulevard,
>St. Louis, MO 63108
>
>mdautova at watson.wustl.edu
>Tel. + 314-286-1118
>Fax. + 314-286-1810
>
>http://nematode.net/
>**************************************




More information about the EMBOSS mailing list