[Bioperl-l] SiteMatrix changes

Hilmar Lapp hlapp at gmx.net
Thu Aug 31 18:05:43 UTC 2006


On Aug 31, 2006, at 1:26 PM, skirov wrote:

>> If you're going to do the correction, you always do it, not just when
>> one of the positions contains 0. I imagine you were detecting 0 as an
>> indicator that no correction had been done, but its possible no
>> correction has been done even if none of the positions has a 0.
>
> Well, if none of the positions is 0, no correction is really  
> necessary.

That's wrong. Refer to my previous email as to why.

As a trivial example, assume we have 3 events A, B, and C. I sample 5  
times and observe:

    A  B  C
N  2  2  1

According to this table, the frequency estimate for C would be 0.2,  
and 0.4 for the other two. The true frequencies may, however, be f(A)  
= 0.5, f(B) = 0.48, and f(C) = 0.02.

If I sample 100 times, I might get

     A   B  C
N  51  47  3

By applying a uniform prior in the form of pseudo-counts to the first  
table, I get:

    A  B  C
N  3  3  2

with fhat(C) = 0.25 and 0.375 for the other two and it becomes clear  
that I don't have enough data yet.

Applying the pseudo-count prior to the second table yields

    A  B  C
N 52 48  4

with fhat(C) = 0.04 compared to 0.03 without the pseudo-count  
correction.

You don't apply a prior distribution *after* you have seen the data.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list