[BioPython] [PopGen] a random Haplotype Sets generator

Bruce Southey bsouthey at gmail.com
Fri Nov 14 15:17:08 UTC 2008


[snip]
>   
>> Some other comments
>>
>> Perhaps I misunderstood the situation but the major problem that I have is
>> that the locations are treated as independent so your model assumes unlinked
>> loci. I just don't find this a useful scenario.
>>     
>
> This depends on which parameters you pass to the HaplotypesGenerator
> init function.
> I would prefer to create a basic module that generates sequences given
> the frequencies and alleles in every position, and other functions to
> create its parameters.
>   
Well this depends on your meaning for haplotype (e.g. 
http://en.wikipedia.org/wiki/Haplotype).  I agree but you need to 
capture how close the positions ie linkage/ linkage disequilibrium. 
Simulating independent positions in a required format is useful but this 
is just a special case of simulating dependent positions.


> I forgot to say it in the first mail, but if you want to use more
> sophisticated scenarios - like populations that have suffered a
> bottleneck or have a particular history - there are already better
> tools available to do that; we should think on how to integrate this
> module with them.
> Maybe I should rename this module as 'SimpleHaplotypesSampler'.
>   
Perhaps IndependentLociSampler. :)
>   
>> You assume that the user knows exactly which locations and frequency to
>> change. Often you just want a random frequency and random location. In that
>> case you need to randomly select locations and frequencies based on some
>> function. But I do not find the mode=='random' of paramsGenerator sufficient
>> to address this. Further, you might want a random sequence of some length
>> but you not want all locations to change.
>>     
>
> ok, but consider that these are haplotypes and not sequences, so you
> most likely need to have regions that are more conserved and others
> that change more.
> This is a good question, about which models to implement, but I would
> need to find a better way to represent frequencies first, and then
> think about which models to implement.
>   
Really the implementation requires some representation of the genetic 
map. After all if the positions are very close, the two loci should not 
change very frequently. I do not know a nice way to represent this even 
with genetic marker simulation (something I do know about). I have not 
used simcoal as my work has moved from genetic markers. Perhaps you need 
to see how simcoal and similar packages do it.

I do understand the usefulness of the simulating independent loci but I 
also find it a very simple special case of what should be done.  I think 
you need to develop some outline of what you want to achieve that 
changes as it progresses. Also, not everything needs to get done, other 
people can contribute if they want to but the general framework needs to 
be in place.

Bruce




More information about the Biopython mailing list