[Biopython-dev] Merging the GFF3 and VCF branches

Brad Chapman chapmanb at 50mail.com
Thu Jun 4 09:54:12 UTC 2015


Eric;
Thanks for looking at this. +1 on getting Lenna's work in and I'll let
her comment on that compared to the current state of VCF support in
pysam and PyVCF. For GFF, I'd actually rather see
integration/collaboration with Ryan's gffutils:

https://github.com/daler/gffutils

It uses sqlite to organize the data and is much better engineered than
my GFF work. He took all my pathological test cases and made them work,
and it also has initial biopython integration:

https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py

The main work would be to take some of the scripts in bcbio-gff that
folks find useful, like the GFF/GenBank conversion through SeqIO, and
port these over. This has been something I wanted to do for a while but
never got done. What does everyone think?
Brad

> Biopythoneers,
>
> I am interested in improving Biopython's support for genomic data, namely
> through merging the existing GFF3 and VCF branches.
>
> Where we last left off, Brad's GFF branch was available on a fork:
> http://biopython.org/wiki/GFF_Parsing
> https://github.com/chapmanb/bcbb/tree/master/gff
>
> When this branch was submitted to Biopython, in 2009 or so, there was a
> subtle conflict with the way nested annotations were represented as
> SeqFeatures in Biopython. Peter tested several possible resolutions to this
> issue on branches, the last of which appears to be f_loc5:
> https://github.com/peterjc/biopython/tree/f_loc5
>
> For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper
> compatible with Peter's SeqFeature updates (actually the f_loc4 branch, I
> guess?) and Brad's GFF parser:
> http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants
> http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends
> https://github.com/lennax/biopython/
>
> What would it take to merge all of this once-recent work into Biopython?
> Are the SeqFeature CompoundLocation changes satisfactory and ready to merge
> into the mainline? Are we willing to make this compatibility break? If not,
> should we instead add another class/module to support the new behavior
> (BetterSeqFeature)?
>
> Happy to help,
> Eric
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev


More information about the Biopython-dev mailing list