[EMBOSS] relative abundance/word bias application
Eliot Bush
bush at HMC.Edu
Thu Jul 26 16:44:00 UTC 2007
hello,
I'm wondering if people have any interest in including in EMBOSS an
application to calculate the relative abundance/bias of words.
The measure I have in mind is that used by Karlin and others (for
example in Burge, C. et al. PNAS 1992). It is the frequency of a
particular word, divided by its expected frequency based on the
frequencies of all its subwords, including gapped subwords. This gives
you bias at a particular word size, removing the effects at smaller word
sizes.
For small word sizes there are formulas which one can use, but as you
get to larger sizes these get unwieldy. I've been working on some code
which is able calculate this measure up to 10 or 11 bp words in
reasonable amounts of time. If there is interest, I would be happy to
contribute it.
Eliot
More information about the EMBOSS
mailing list