[Biopython-dev] plink phasing and others

Tiago Antão tiagoantao at gmail.com
Mon Apr 16 09:35:21 UTC 2012


During the last few months I have been in an hell hole writing code
like mad. Maybe some of this code is of interest to share.

I currently have:

1. Code to parse plink output. Pretty trivial stuff, but I bet lots of
people are doing this
2. Code to process admixture results. Admixture is far less used than STRUCTURE
3. Code to deal with phasing formats. Beagle, PHASE and shapeit
4. PCA
5. Some gene ontology stuff

My GO stuff is pretty specific, so I guess it might not be of interest.
All the other components are of fairly widely used things.
Admixture and PCA are standard popgen analysis. Admixture code could
probably be changed to also support STRUCTURE. I am not sure but PCA
might only work on linux.
Plink and phasing are of more general interest. These would be out of

There is no strange requirement to any of these code with one
exception: admixture and PCA require matplotib.

So that people have an understanding of the impact of these things, I
put the number of scholar citations:
plink - 3315
smartpca - 1673
admixture - 57
structure - 7448
beagle - >300
fastphase - 1935

Unfortunately there is little code to do automated analysis using these tools.

I could start migrating some of this code to biopython (would have to
write documentation, and comment the code better ;) )

"Liberty for wolves is death to the lambs" - Isaiah Berlin

More information about the Biopython-dev mailing list