[Bioperl-l] Using Bio::PopGen::Statistics with missing data

Warren W. Kretzschmar wkretzsch at gmail.com
Mon Feb 15 22:34:36 UTC 2010


Hi,
I have two questions:
Will the Bio::PopGen::Stastistcis module functions all work with
missing data in the ingroup as well as the outgroup populations?

Here is some sample data in csv format

Ingroup:

SAMPLE,chr6_24659643,chr6_24659708,chr6_24664915,chr6_24667221,chr6_24667260,chr6_24672519,chr6_24672523,chr6_24677020,chr6_24684610,chr6_24684764,chr6_24684785,chr6_24686388,chr6_24686434,chr6_24691838,chr6_24696805,chr6_24696863,chr6_24696878,chr6_24704356,chr6_24704412,chr6_24704457,chr6_24704515,chr6_24704652,chr6_24704721,chr6_24704742,fchr1_35673249,fchr1_35673250,fchr1_35679239,fchr1_35679296,fchr1_35689919,fchr1_35698567,fchr1_35700870,fchr1_35744884,fchr1_35744941,fchr1_35792606,fchr6_24659713,fchr6_24664917,fchr6_24667334,fchr6_24671655,fchr6_24672514,fchr6_24672523,fchr6_24684620,fchr6_24684674,fchr6_24686387,fchr6_24688078,fchr6_24690529,fchr6_24691897,fchr6_24696850,fchr6_24696911,fchr6_24696963,fchr6_24704111,fchr6_24704285,fchr6_24704362,fchr6_24704817
NA20856,T T,T T,C C,T T,A A,A A,T T,C C,C C,T T,C C,T T,G G,C C,C C,T
T,T T,C C,A A,G G,G G,A A,C C,G G,A A,A A,T T,C C,A A,T T,A A,G G,A
A,G G,T T,A A,T T,C C,G G,T T,C C,G G,C C,G G,T T,G G,T T,C C,C C,T
T,A A,C C,G G
NA20853,T T,T T,C C,T T,A A,A A,T T,C C,C C,T T,C C,T T,G G,C C,C C,C
C,T T,C C,A A,G G,G G,A A,C C,G G,A A,A A,T T,C C,A A,T T,A A,G G,A
A,G G,T T,A A,T T,C C,G G,T T,C C,G G,C C,G G,T T,G G,T T,C C,C C,T
T,A A,C C,G G
NA20849,C T,C T,C C,T T,A G,A A,T T,C C,C C,T T,C C,C T,G G,C C,C C,T
T,T T,C C,A A,G G,G G,A A,C C,G G,A A,A A,T T,C C,A A,T T,A A,G G,A
A,G G,T T,A A,T T,C C,G G,T T,C C,G G,C C,G G,T T,G G,T T,C C,C C,T
T,A A,C C,G G


Outgroup:

SAMPLE,chr6_24659643,chr6_24659708,chr6_24664915,chr6_24667221,chr6_24667260,chr6_24672519,chr6_24672523,chr6_24677020,chr6_24684610,chr6_24684764,chr6_24684785,chr6_24686388,chr6_24686434,chr6_24691838,chr6_24696805,chr6_24696863,chr6_24696878,chr6_24704356,chr6_24704412,chr6_24704457,chr6_24704515,chr6_24704652,chr6_24704721,chr6_24704742,fchr1_35673249,fchr1_35673250,fchr1_35679239,fchr1_35679296,fchr1_35689919,fchr1_35698567,fchr1_35700870,fchr1_35744884,fchr1_35744941,fchr1_35792606,fchr6_24659713,fchr6_24664917,fchr6_24667334,fchr6_24671655,fchr6_24672514,fchr6_24672523,fchr6_24684620,fchr6_24684674,fchr6_24686387,fchr6_24688078,fchr6_24690529,fchr6_24691897,fchr6_24696850,fchr6_24696911,fchr6_24696963,fchr6_24704111,fchr6_24704285,fchr6_24704362,fchr6_24704817
OUT0000,T T,,C C,T T,A A,G G,,C C,C C,T T,C C,T T,G G,C C,C C,C C,T
T,C C,A A,G G,G G,A A,C C,G G,C C,C C,C C,T T,G G,C C,G G,T T,G G,A
A,G G,G G,C C,T T,A A,C C,A A,A A,T T,A A,C C,A A,C C,A A,T T,C C,G
G,T T,A A


The second question is this: If I pass the outgroup in as a population
(i.e. a population of a single homozygous diploid individual), then I
get a completely different fu_and_li_F value than when I only pass in
a haploid individual as the outgroup.  (In the former case I also get
a 'missing value' error emanating from the fact that
Bio::PopGen::Statistics->derived_mutations($ingroup_pop,$outgroup_pop)
returns no external mutations).  Because I get no errors when I pass
in a haploid individual as the outgroup, I'm assuming that that is the
correct way to use fu_and_li_F. If so, it would be helpful to put this
information in the Bio::PopGen::Statistics POD file.  The way it is
currently written, one could assume that fu_and_li_F will deal with a
homozygous diploid individual the same way it deals with a haploid
individual, which I don't think is the case.

Thanks,
Warren Kretzschmar




More information about the Bioperl-l mailing list