[Biopython-dev] GSoC genomic variant proposal

Brad Chapman chapmanb at 50mail.com
Tue Apr 3 14:53:33 UTC 2012


Lenna;
Thanks for getting this together, that's a great start. I left some
specific comments but my general suggestion is to get more detailed
about the code specifics. During the summer, you use the weekly timeline
as a todo list so having lots of details make the process so much
easier. Instead of seeing a general item like: "Implement X" you want
"Implement X by extending API from last week to support get_Y using
sqlite3 index table. Test cases A, B, C and D to avoid...".

Having these kind of checklist todos helps make it easy to get started
each week and ensure everything is on track. The additional benefit for
selection is that is helps convince reviewers you've thought about the
technical details and forseen any potential problems.

Hope this helps,
Brad

> Hi Brad,
> 
> Thank you so much for your suggestions. My initial evaluation of the
> strengths of existing software has led me to strongly agree with your
> recommendation to focus on the usability of the API.
> 
> I submit this draft of my proposal to the dev list for feedback:
> 
> https://docs.google.com/document/d/116FDQLtNnYWnm0kojad4YmQrM3cjOO8D2Vr82aW6xyA/edit
> 
> 
> Lenna
> 
> 
> On Sun, Apr 1, 2012 at 3:13 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> > Lenna;
> > Thanks for the introduction and glad to hear about your interest in the
> > variant project. I'm looking forward to seeing your proposal.
> >
> > The workflow for the variant project involves a biologist querying a VCF
> > or GVF file with variants from an experiment. They should be able to
> > easily subset and filter by file components:
> >
> > - Variant type: Homozygous/Heterozygous variants
> > - Metrics: depth, strand bias, allele frequency..
> > - Variants annotated in coding regions causing amino acid changes
> >
> > As well as rapid subsetting by chromosomal region.
> >
> > My syggestion would be to leverage external tools as much as possible to
> > do file manipulation and focus on an API that lets users filter and
> > extract information pre-contained in the INFO file.
> >
> > Hope this is helpful as a place to get started. We can provide
> > additional feedback once you have your proposal ready. Thanks again,
> > Brad
> >
> >> Hi all,
> >>
> >> I realize time is short, but I am still in the planning phase of my
> >> GSoC proposal! I wanted to take a moment to formally introduce myself
> >> to the dev list.
> >>
> >> I am affiliated with Purdue University, located in Indiana, USA and
> >> best known for engineering (Neil Armstrong is a famous graduate). I
> >> hold a bachelor of arts in biology from Mount Holyoke College in
> >> Massachusetts. I have extensive wet lab experience with genetics; I'm
> >> currently working in a lab genotyping mice (the research is intestinal
> >> lipid metabolism). In August, I begin a PhD in interdisciplinary life
> >> science at Purdue, and I anticipate that my research will fall
> >> somewhere in the field of bioinformatics/computational biology. I hope
> >> to use biopython extensively!
> >>
> >> In my spare time, other than programming, I enjoy ballroom dance,
> >> science fiction novels, board games, and sailing.
> >>
> >> I've been programming for about 6 years and using python for 4; other
> >> languages with which I'm familiar include Perl/CGI, HTML/CSS, PHP, SQL
> >> (primarily MySQL and SQLite), and C++/C. I place a high value on
> >> object oriented design and execution.
> >>
> >> I understand the basics of formal grammar and have some experience
> >> with lex/flex as well as PLY (python lex/yacc). My work so far with
> >> biopython has been on the CIF parsing module. One of my primary goals
> >> for the genomic variants project would be to implement as much
> >> polymorphism and abstraction as possible, for the benefit of both
> >> users and future developers.
> >>
> >> I'm working on a proposal for the genomic variants project, and while
> >> I understand the basics of molecular biology and genetics, I lack
> >> firsthand experience with the type of workflow that would occur in the
> >> context of genomic variants. If anyone can supply a few examples, it
> >> would be greatly appreciated.
> >>
> >> I hope to have a proposal draft ready for feedback by Monday.
> >>
> >> Regards,
> >>
> >> Lenna Peterson
> >> github.com/lennax
> >> _______________________________________________
> >> Biopython-dev mailing list
> >> Biopython-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list