[Biopython-dev] GSoC genomic variant proposal

Lenna Peterson arklenna at gmail.com
Thu Apr 5 00:04:30 UTC 2012

On Tue, Apr 3, 2012 at 10:53 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Lenna;
> Thanks for getting this together, that's a great start. I left some
> specific comments but my general suggestion is to get more detailed
> about the code specifics. During the summer, you use the weekly timeline
> as a todo list so having lots of details make the process so much
> easier. Instead of seeing a general item like: "Implement X" you want
> "Implement X by extending API from last week to support get_Y using
> sqlite3 index table. Test cases A, B, C and D to avoid...".
> Having these kind of checklist todos helps make it easy to get started
> each week and ensure everything is on track. The additional benefit for
> selection is that is helps convince reviewers you've thought about the
> technical details and forseen any potential problems.
> Hope this helps,
> Brad

Hi all,

I'm linking to a revision of my GSoC proposal:


Thank you to everyone for your feedback.


I didn't realize Biopython has never been tested on IronPython. As I
have no familiarity with .NET or Windows, I'll have to rescind my
offer to test it. Sorry to get your hopes up!


I've revised the prose sections and almost completely rewritten the
timeline. This version provides more information about my background,
a more detailed description of the overall project, and more specific


I've tried to go into as much detail as my knowledge of VCF and GVF
structure allows. I laid out a more specific structure for both the
backend and frontend structures for the data. I've revised the unit
tests to be more specific and less dependent on interaction with other
modules and I've tried to anticipate some cases that may produce
unexpected behavior. I also highlighted specific places where the
design should be generalizable.


I hope my revised project description is more focused. Regarding CNV
etc., I did not mean to specifically exclude them by mentioning SNPs,
and I've reworded that paragraph to be more general. I get the
impression that CNV and other structural variants are considerably
more complex to represent and manipulate. I'd be more than happy to
read more about breakpoint theory etc. and to prototype any specific
workflows you might suggest.


More information about the Biopython-dev mailing list