[Biopython-dev] biopython-dev

Fri Jan 5 10:35:59 UTC 2007

Hi Ralph,

Thanks for the info, let me see if I can sum up what I have and what I
am planning to do...

I currently work with microsatellite and SNP data (already isolated
ones, not retrieved from sequences that I have). I have code (parsers,
controllers... is varies from case to case; the quality also varies)
related to GenePop, fdist2, SimCoal2, Arlequin. I also have
preliminary code to work with HapMap and the UCSC table browser.
I have code implementing some statistics like Fst (Cockram and Weir),
expected/observed heterozygozity, ...
I will be, in the middle term, quite interested in all the sequence
part (Tajima Ds, Fu and Li's, and e.g. the new statistic in the Voight
2006 paper). Also, linkage disequilibrium is very high on my priority
list.
I have been thinking quite a bit on representation of markers and
populations (especially in a genomic context). e.g. I have noticed
that you use a couple of arrays, one with names, the other with
sequences, to represent population data. I am currently scratching my
head with representation on a genomic scale (ie, multi-marker, mainly
because of LD). But I think this will come smoothly when I really
start to do LD studies...
This is all in a context of detecting selection, disentangling
selection from population structure, and hopefully, in the near future
coevolution in the context of host/parasite (diseases...).

I have set aside some time to assure that all the code that I am doing
can be reused by the community. It is my plan to build and maintain
this code during the next years (I am funded until 2010 with a PhD
grant).

Regards,
Tiago
On 1/4/07, Ralph Haygood <rhaygood at duke.edu> wrote:
> Tiago,
>
> Yes, I do still read biopython-dev.  But at the moment, I have even
> less time than usual, because I'm at a conference.  If there's
> something you want to ask me, go ahead, but unless the answer is
> trivial, it may take me several days.
>
> You're right that my stuff is very sequence oriented.  In fact, it's
> very alignment oriented.  It can analyze simple insertion/deletion as
> well as single-nucleotide variation.  Here's a typical use case, to
> give you the flavor:
>
> alignment = phylip_file_to_alignment("sm50PromoterSpurAfra.phy")
> populations = {'Spur': range(20), 'Afra': [20]}
> statistics = Statistics(alignment, populations)
> print "ungapped length: %d" % statistics.ungapped_length()
> print "K SNPs: %d" % statistics.get_K('Spur')
> print "K simple indels: %d" % statistics.get_K_simple_indel('Spur')
> print "theta_W SNPs: %g" % statistics.get_theta_W('Spur')
> print "theta_W simple indels: %g" % statistics.get_theta_W_simple_indel('Spur')
> print "pi SNPs: %g" % statistics.get_pi('Spur')
> print "pi simple indels: %g" % statistics.get_pi_simple_indel('Spur')
> print "D_T SNPs: %g" % statistics.get_D_T('Spur')
> print "D_T simple indels: %g" % statistics.get_D_T_simple_indel('Spur')
> print "D_FL SNPs: %g" % statistics.get_D_FL('Spur', 'Afra')
> print "D_FL simple indels: %g" % statistics.get_D_FL_simple_indel('Spur', 'Afra')
> etc.
>
> Spur is Stronglyocentrotus purpuratus and Afra is Allocentrotus
> fragilis, two closely related species of sea urchin.  In this example,
> I have 20 sequences of a certain region from Spur and one from Afra,
> so I'm analyzing the population genetics of the region within Spur,
> with Afra as an outgroup for doing things like inferring which allele
> is ancestral at a polymorphism within Spur.  K is the number of
> polymorphisms, theta_W is Watterson's estimator of 4 x effective
> population size x neutral mutation rate, pi is the average number of
> pairwise differences between alleles, D_T is Tajima's D, D_FL is Fu
> and Li's D (which requires an outgroup), etc.  The software can do
> more elaborate things like permutation tests for assessing whether a
> statistic differs between two alignments, which might be something
> like known transcription factor binding sites versus other nucleotide
> sites in a promoter.  The canned software DnaSP can't do that, which
> is one of the reasons why I wrote my stuff.
>
> Ralph
>

-- 
Good judgment comes from experience.
Experience comes from bad judgment.
- Unknown author