[Biopython] Statistical similarity in microarray data
Peter Saffrey
pzs at dcs.gla.ac.uk
Tue Feb 16 13:37:57 UTC 2010
This isn't strictly a biopython question, but I hoped I might find some
expertise here.
I need to compare two microarrays for similarity. Each file is a set of
spots and their corresponding values. By ordering the values by the spot
id and discarding points that are missing from either set, I can compare
the two experiments. We are trying to show that samples using a new
method correlate with the old method.
Up until recently, we were using a Pearson correlation (from
scipy.stats) but this assumes the data is normally distributed, which is
probably isn't. The correlations were a little unreliable.
After a bit of digging, I tried using a Wilcoxon (also from
scipy.stats), but this seems to give high correlations for things it
shouldn't, like files that are different samples. It also seems to lack
precision. I get p-values of 0 quite a lot; even 1e-80 would reassure me
that something is really happening underneath.
Does anybody have any experience with this type of statistical work?
Cheers,
Peter
More information about the Biopython
mailing list