[Dynamite] Score module
Ian Holmes
ihh@fruitfly.org
Sun, 5 Mar 2000 19:45:17 -0800 (PST)
I think we also need a Score module (or some such name).
One concern I have is about how we represent scores; I have been bitten
quite hard with Handel/kimono, assuming that it will be OK to represent
scores as ints rounded to the nearest 1/1000 bit (cf HMMER), then hitting
overflow problems when you start adding together the scores for all 6,000
sequences in the yeast genome for example...
On the other hand it clearly makes sense to keep scores as integers within
a DP routine. The solution I am trying within Handel/kimono is to convert
to (floating point) log-likelihoods for large sums.
Here's the IDL:
module Score {
// the following typedefs are included for clarity
typedef float Probability; // probability space
typedef float BitScore; // log to base 2
typedef float NatScore; // log to base e
typedef int IntScore; // internal score (or "integer score" if you prefer)
// NB the IntScore-to-BitScore conversion factor is not explicitly exposed by the following interface.
interface Scheme {
// conversions that are independent of internal format (i.e. between probabilities, bits & nats)
attribute BitScore bits_infinity; // largest value that any .*2bits() call will return
attribute NatScore nats_infinity; // largest value that any .*2nats() call will return
BitScore prob2bits (Probability p);
BitScore nats2bits (NatScore n);
NatScore prob2nats (Probability p);
NatScore bits2nats (BitScore b);
Probability bits2prob (BitScore b);
Probability nats2prob (NatScore n);
// conversions that aren't independent of internal format
attribute IntScore ints_infinity; // largest value that any .*2ints() call will return
// NB the three infinities (bits_infinity, nats_infinity, ints_infinity) do NOT have to be "in sync"
// i.e. there is no guarantee that ints2bits(ints_infinity) == bits_infinity, etc.
BitScore ints2bits (IntScore i);
NatScore ints2nats (IntScore i);
Probability ints2prob (IntScore i); // implementation detail: we may want to use a lookup table for this
IntScore prob2ints (Probability p);
IntScore bits2ints (BitScore b);
IntScore nats2ints (NatScore n);
// fast math functions
IntScore prob_sum (IntScore i1, IntScore i2); // == prob2ints (ints2prob(i1) + ints2prob(i2))
// implementation detail: should use a lookup table, to speed up the Forward algorithm
// In my codebase, I also have an algorithm for doing a maximum-accuracy probability-space sum over
// an arbitrarily long list of IntScores by repeatedly extracting the lowest two scores, summing them,
// and re-inserting the result into the (sorted) list.
// This is safer than just cycling through the list and summing, since if the difference between the
// two scores i1 & i2 in prob_sum() exceeds the size of the lookup table then roundoff error will occur.
// I don't think we need to worry about this yet -- but just so it's documented... -ihh
};
}
--
Ian Holmes .... Howard Hughes Medical Institute .... ihh@fruitfly.org