Fwd: [Bioperl-l] Bio::PopGen::Statistics discrete calculations

Tue Mar 2 04:33:13 EST 2004

> >I'm interested in calculating some of the statistics in
> >Bio::PopGen::Statistics in a discrete, position-wise, way.
> >
> >For example,
> >
> >I've done a quick implementation of the method "pi" that returns the
> >value of the statistic for each polymorphic positions instead of the
> >mean of all of them. It's "pi_discrete" method:
> >
> >http://www.ebi.ac.uk/cvs/cvspublic/wallace/bioperl-live/Bio/PopGen/Statistics.pm?rev=1.4&content-type=text/x-cvsweb-markup
> >
> 
> 
> Do you want to just calculate each site's heterozygosity?  Otherwise, 
> I'm not quite sure what 'pi' means in this context.  The pi 
> calculation currently implemented simply sums up each site's 
> heterozygosity.

Well, in pi_discrete I save the name of the marker in the key of the
hash, which for MSA files, using
Bio::PopGen::Utilities::aln_to_population is the position in the
alignment:

       for( my $i = 0; $i < $aln->length; $i++ ) {
# 	   my $nm = "Site-$i";
 	   my $nm = sprintf "Site-%09d", $i;

Apart from that, yes*, it's just heterozygosity in sliding windows of
width=1 sliding with jump=1 position each time.

*pi being a bad example of the statistic_discrete methods.

> >I would also like to have methods to do Sliding Window calculations for
> >these statistics. Something like:
> >
> >"Calculate tajima_D for windows of 100 polymorphisms that slide 10
> >positions"
> >
> >Any comments about the logic, naming, interest of this methods?
> 
> 
> Sounds great--I'm sure it would get used.  The only precaution I 
> would give is that some of the formats for inputting the polymorphism 
> data (i.e. prettybase files) do not have to include monomorphic 
> sites.  This would really mess with any kind of sliding window.

I would say that there are two kinds of windows concerning to that:

"Real position windows": where the window refers to the position in the
alignment (either, mono- or polymorphic)

"Polymorphic pos. windows": where the window refers to, e.g., the first
10 polymorphisms in the prettybase file. Obviously, with this second
option, the "real" width of the window is going to be different each
time. But if the name of the marker in the prettybase file informs about
the position, like "Site-$i", then one can, roughly, calculate the
midpoint for those polymorphisms.

And about the prettybase format, I would like to have a conversor in
bioperl from Bio::AlignIO aligments to prettybase "polymorphisms" or
markers.

Which would be the best place for that? Maybe in Bio::PopGen::Utilities?

Feedback is welcomed,

Thanks in advance,

    Albert.

> 
> 
> 
> 
> cheers,
> Matt
> 
> 
> 
> >Thanks,
> >
> >     Albert.
> >
> >--
> >Albert Vilella Bertran    avilella_at_ebi_ac_uk
> >EMBL Outstation, European Bioinformatics Institute
> >Wellcome Trust Genome Campus, Hinxton
> >Cambs. CB10 1SD, United Kingdom
> >Phone: +44 (0)1223 494 448   FAX: +44 (0)1223 494 468
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >----- End forwarded message -----
>