[Biopython-dev] plink phasing and others

Peter Cock p.j.a.cock at googlemail.com
Mon Apr 16 10:26:30 UTC 2012

2012/4/16 Tiago Antão <tiagoantao at gmail.com>:
> Hi,
> During the last few months I have been in an hell hole writing code
> like mad. Maybe some of this code is of interest to share.
> I currently have:
> 1. Code to parse plink output. Pretty trivial stuff, but I bet lots of
> people are doing this
> 2. Code to process admixture results. Admixture is far less used than STRUCTURE
> 3. Code to deal with phasing formats. Beagle, PHASE and shapeit
> 4. PCA
> 5. Some gene ontology stuff
> My GO stuff is pretty specific, so I guess it might not be of interest.
> All the other components are of fairly widely used things.
> Admixture and PCA are standard popgen analysis. Admixture code could
> probably be changed to also support STRUCTURE. I am not sure but PCA
> might only work on linux.
> Plink and phasing are of more general interest. These would be out of
> Bio.PopGen.
> There is no strange requirement to any of these code with one
> exception: admixture and PCA require matplotib.
> So that people have an understanding of the impact of these things, I
> put the number of scholar citations:
> plink - 3315
> smartpca - 1673
> admixture - 57
> structure - 7448
> beagle - >300
> fastphase - 1935
> Unfortunately there is little code to do automated analysis using these tools.
> I could start migrating some of this code to biopython (would have to
> write documentation, and comment the code better ;) )

Sounds good. The GO stuff would/should be more general than just
PopGen, and I know other people are looking at this on branches.

When you said PCA, that was principle component analysis, right?
What are you adding on top of NumPy/SciPy/matplotlib?


More information about the Biopython-dev mailing list