[Biopython-dev] GSoC genomic variant proposal

Brad Chapman chapmanb at 50mail.com
Sun Apr 1 19:13:56 UTC 2012


Lenna;
Thanks for the introduction and glad to hear about your interest in the
variant project. I'm looking forward to seeing your proposal.

The workflow for the variant project involves a biologist querying a VCF
or GVF file with variants from an experiment. They should be able to
easily subset and filter by file components:

- Variant type: Homozygous/Heterozygous variants
- Metrics: depth, strand bias, allele frequency..
- Variants annotated in coding regions causing amino acid changes

As well as rapid subsetting by chromosomal region.

My syggestion would be to leverage external tools as much as possible to
do file manipulation and focus on an API that lets users filter and
extract information pre-contained in the INFO file.

Hope this is helpful as a place to get started. We can provide
additional feedback once you have your proposal ready. Thanks again,
Brad

> Hi all,
> 
> I realize time is short, but I am still in the planning phase of my
> GSoC proposal! I wanted to take a moment to formally introduce myself
> to the dev list.
> 
> I am affiliated with Purdue University, located in Indiana, USA and
> best known for engineering (Neil Armstrong is a famous graduate). I
> hold a bachelor of arts in biology from Mount Holyoke College in
> Massachusetts. I have extensive wet lab experience with genetics; I'm
> currently working in a lab genotyping mice (the research is intestinal
> lipid metabolism). In August, I begin a PhD in interdisciplinary life
> science at Purdue, and I anticipate that my research will fall
> somewhere in the field of bioinformatics/computational biology. I hope
> to use biopython extensively!
> 
> In my spare time, other than programming, I enjoy ballroom dance,
> science fiction novels, board games, and sailing.
> 
> I've been programming for about 6 years and using python for 4; other
> languages with which I'm familiar include Perl/CGI, HTML/CSS, PHP, SQL
> (primarily MySQL and SQLite), and C++/C. I place a high value on
> object oriented design and execution.
> 
> I understand the basics of formal grammar and have some experience
> with lex/flex as well as PLY (python lex/yacc). My work so far with
> biopython has been on the CIF parsing module. One of my primary goals
> for the genomic variants project would be to implement as much
> polymorphism and abstraction as possible, for the benefit of both
> users and future developers.
> 
> I'm working on a proposal for the genomic variants project, and while
> I understand the basics of molecular biology and genetics, I lack
> firsthand experience with the type of workflow that would occur in the
> context of genomic variants. If anyone can supply a few examples, it
> would be greatly appreciated.
> 
> I hope to have a proposal draft ready for feedback by Monday.
> 
> Regards,
> 
> Lenna Peterson
> github.com/lennax
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list