<div dir="ltr"><div><div><div><div><div><div>Biopythoneers,<br><br></div>I am interested in improving Biopython&#39;s support for genomic data, namely through merging the existing GFF3 and VCF branches.<br><br></div>Where we last left off, Brad&#39;s GFF branch was available on a fork:<br><a href="http://biopython.org/wiki/GFF_Parsing">http://biopython.org/wiki/GFF_Parsing</a><br><a href="https://github.com/chapmanb/bcbb/tree/master/gff">https://github.com/chapmanb/bcbb/tree/master/gff</a><br><br>When this branch was submitted to Biopython, in 2009 or so, there was a subtle conflict with the way nested annotations were represented as SeqFeatures in Biopython. Peter tested several possible resolutions to this issue on branches, the last of which appears to be f_loc5:<br><a href="https://github.com/peterjc/biopython/tree/f_loc5">https://github.com/peterjc/biopython/tree/f_loc5</a><br><br></div>For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper compatible with Peter&#39;s SeqFeature updates (actually the f_loc4 branch, I guess?) and Brad&#39;s GFF parser:<br><a href="http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants">http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants</a><br><a href="http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends">http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends</a><br><a href="https://github.com/lennax/biopython/">https://github.com/lennax/biopython/</a><br><br></div>What would it take to merge all of this once-recent work into Biopython? Are the SeqFeature CompoundLocation changes satisfactory and ready to merge into the mainline? Are we willing to make this compatibility break? If not, should we instead add another class/module to support the new behavior (BetterSeqFeature)?<br><br></div>Happy to help,<br></div>Eric<br></div>