[Biopython-dev] Merging the GFF3 and VCF branches

Eric Talevich eric.talevich at gmail.com
Wed Jun 3 18:08:34 UTC 2015


Biopythoneers,

I am interested in improving Biopython's support for genomic data, namely
through merging the existing GFF3 and VCF branches.

Where we last left off, Brad's GFF branch was available on a fork:
http://biopython.org/wiki/GFF_Parsing
https://github.com/chapmanb/bcbb/tree/master/gff

When this branch was submitted to Biopython, in 2009 or so, there was a
subtle conflict with the way nested annotations were represented as
SeqFeatures in Biopython. Peter tested several possible resolutions to this
issue on branches, the last of which appears to be f_loc5:
https://github.com/peterjc/biopython/tree/f_loc5

For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper
compatible with Peter's SeqFeature updates (actually the f_loc4 branch, I
guess?) and Brad's GFF parser:
http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants
http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends
https://github.com/lennax/biopython/

What would it take to merge all of this once-recent work into Biopython?
Are the SeqFeature CompoundLocation changes satisfactory and ready to merge
into the mainline? Are we willing to make this compatibility break? If not,
should we instead add another class/module to support the new behavior
(BetterSeqFeature)?

Happy to help,
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150603/daddaebc/attachment.html>


More information about the Biopython-dev mailing list