[Bioperl-l] SiteMatrix changes
Hilmar Lapp
hlapp at gmx.net
Thu Aug 31 18:05:43 UTC 2006
On Aug 31, 2006, at 1:26 PM, skirov wrote:
>> If you're going to do the correction, you always do it, not just when
>> one of the positions contains 0. I imagine you were detecting 0 as an
>> indicator that no correction had been done, but its possible no
>> correction has been done even if none of the positions has a 0.
>
> Well, if none of the positions is 0, no correction is really
> necessary.
That's wrong. Refer to my previous email as to why.
As a trivial example, assume we have 3 events A, B, and C. I sample 5
times and observe:
A B C
N 2 2 1
According to this table, the frequency estimate for C would be 0.2,
and 0.4 for the other two. The true frequencies may, however, be f(A)
= 0.5, f(B) = 0.48, and f(C) = 0.02.
If I sample 100 times, I might get
A B C
N 51 47 3
By applying a uniform prior in the form of pseudo-counts to the first
table, I get:
A B C
N 3 3 2
with fhat(C) = 0.25 and 0.375 for the other two and it becomes clear
that I don't have enough data yet.
Applying the pseudo-count prior to the second table yields
A B C
N 52 48 4
with fhat(C) = 0.04 compared to 0.03 without the pseudo-count
correction.
You don't apply a prior distribution *after* you have seen the data.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list