[Biopython] Statistical similarity in microarray data

Tue Feb 16 13:37:57 UTC 2010

This isn't strictly a biopython question, but I hoped I might find some 
expertise here.

I need to compare two microarrays for similarity. Each file is a set of 
spots and their corresponding values. By ordering the values by the spot 
id and discarding points that are missing from either set, I can compare 
the two experiments. We are trying to show that samples using a new 
method correlate with the old method.

Up until recently, we were using a Pearson correlation (from 
scipy.stats) but this assumes the data is normally distributed, which is 
probably isn't. The correlations were a little unreliable.

After a bit of digging, I tried using a Wilcoxon (also from 
scipy.stats), but this seems to give high correlations for things it 
shouldn't, like files that are different samples. It also seems to lack 
precision. I get p-values of 0 quite a lot; even 1e-80 would reassure me 
that something is really happening underneath.

Does anybody have any experience with this type of statistical work?

Cheers,

Peter