[Biojava-l] Distribution

Matthew Pocock mrp@sanger.ac.uk
Tue, 17 Apr 2001 16:54:15 +0100


Hi.

While writing the Distribution tutorial for the bootcamp, I noticed that 
Distribution didn't actualy define a probability dencity funciton 
because it does some trickery when handling ambiguity symbols. The 
correct behavior is to sum the probability of each atomic symbol that 
matches the ambiguity symbol and return that sum. This makes the 
semantics of getWeight like - give me the probability that we observe 
one of this set of symbols - rather than - give me the probability that 
we observe one of this set of symbols given some null model. I think 
this is a throw-back to the days before null-models realy existed. 
Anyway, for DP with odds ratios the sum should give the expected result.

One up-side to this is that it makes Distribution play much better with 
infinite sets like doubles - integrating Distribution over a range is 
exactly what is expected now when handling an ambiguity symbol over 
doubles that matches an interval (e.g. given the ambiguity symbol 
[-Infinity, 10.0] we would integrate the associated probability dencity 
function up to 10.o from -Infinity, wich is the normal meaning of 
p(10.0) in stats).

If anybody disagrees, be vocal. The change is in CVS, but won't be 
back-ported to 1.1 ever.

Matthew