<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jun 4, 2015 at 3:44 AM, Peter Cock <span dir="ltr"><<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This would be great to have merged - pathological test cases<br>
and interconversion too :)<br>
<br>
Did we settle on a plan for parent/child relationships in<br>
SeqFeature objects (beyond deprecating sub_features<br>
which has been replaced with CompoundLocations)?<br>
<span><font color="#888888"><br>
Peter<br></font></span></blockquote><div><br></div><div>The last thread I see on this topic is from the end of summer 2012:<br><a href="http://mailman.open-bio.org/pipermail/biopython-dev/2012-July/018979.html" target="_blank">http://mailman.open-bio.org/pipermail/biopython-dev/2012-July/018979.html</a> (thread)<br><a href="http://mailman.open-bio.org/pipermail/biopython-dev/2012-September/019101.html" target="_blank">http://mailman.open-bio.org/pipermail/biopython-dev/2012-September/019101.html</a> (terminal)<br><br></div><div>I'm a bit confused because the CompoundLocation class exists in Bio/SeqFeature.py, and git blame says it was written in late 2011 -- Peter's Time Machine in action? Does the f_loc5 branch modify the existing CompoundLocation class, then?<br></div><div><br></div><div>The threads above also mention a deprecation process. I suppose in order to begin that process we need to determine what we're deprecating in favor of, then apply the new functionality and trigger a DeprecationWarning from the old-and-tired sub_features attribute along with some shim to keep things working approximately the way they used to? <br><br>Even if a perfectly smooth transition isn't possible, I think it's worthwhile to make a gentle break to allow Biopython to correctly handle modern file formats for genomic features/annotations.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><font color="#888888">
</font></span><div><div><br>
On Thu, Jun 4, 2015 at 10:54 AM, Brad Chapman <<a href="mailto:chapmanb@50mail.com" target="_blank">chapmanb@50mail.com</a>> wrote:<br>
><br>
> Eric;<br>
> Thanks for looking at this. +1 on getting Lenna's work in and I'll let<br>
> her comment on that compared to the current state of VCF support in<br>
> pysam and PyVCF. For GFF, I'd actually rather see<br>
> integration/collaboration with Ryan's gffutils:<br>
><br>
> <a href="https://github.com/daler/gffutils" target="_blank">https://github.com/daler/gffutils</a><br>
><br>
> It uses sqlite to organize the data and is much better engineered than<br>
> my GFF work. He took all my pathological test cases and made them work,<br>
> and it also has initial biopython integration:<br>
><br>
> <a href="https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py" target="_blank">https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py</a><br>
><br></div></div></blockquote><div><br></div><div>Ryan is a superstar. I see gffutils is MIT-licensed, too, so maybe we can just copy a relevant chunk of the code?<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
> The main work would be to take some of the scripts in bcbio-gff that<br>
> folks find useful, like the GFF/GenBank conversion through SeqIO, and<br>
> port these over. This has been something I wanted to do for a while but<br>
> never got done. What does everyone think?<br>
> Brad<br></div></div></blockquote><div><br></div><div>These?:<br><a href="https://github.com/chapmanb/bcbb/tree/master/gff/Scripts/gff">https://github.com/chapmanb/bcbb/tree/master/gff/Scripts/gff</a><br><br></div><div>I like that plan. The main goal in my mind is to provide a sensible substrate in Biopython for integrating the "tabix" family of formats, using SeqFeature as a core object and making it a little more useful, rather than try to provide a full-featured environment or high-performance I/O. I think Lenna's work was headed in this direction, so I'd also like to focus on merging that functionality and seeing what else falls out of it.<br><br></div><div>-Eric<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
><br>
>> Biopythoneers,<br>
>><br>
>> I am interested in improving Biopython's support for genomic data, namely<br>
>> through merging the existing GFF3 and VCF branches.<br>
>><br>
>> Where we last left off, Brad's GFF branch was available on a fork:<br>
>> <a href="http://biopython.org/wiki/GFF_Parsing" target="_blank">http://biopython.org/wiki/GFF_Parsing</a><br>
>> <a href="https://github.com/chapmanb/bcbb/tree/master/gff" target="_blank">https://github.com/chapmanb/bcbb/tree/master/gff</a><br>
>><br>
>> When this branch was submitted to Biopython, in 2009 or so, there was a<br>
>> subtle conflict with the way nested annotations were represented as<br>
>> SeqFeatures in Biopython. Peter tested several possible resolutions to this<br>
>> issue on branches, the last of which appears to be f_loc5:<br>
>> <a href="https://github.com/peterjc/biopython/tree/f_loc5" target="_blank">https://github.com/peterjc/biopython/tree/f_loc5</a><br>
>><br>
>> For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper<br>
>> compatible with Peter's SeqFeature updates (actually the f_loc4 branch, I<br>
>> guess?) and Brad's GFF parser:<br>
>> <a href="http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants" target="_blank">http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants</a><br>
>> <a href="http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends" target="_blank">http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends</a><br>
>> <a href="https://github.com/lennax/biopython/" target="_blank">https://github.com/lennax/biopython/</a><br>
>><br>
>> What would it take to merge all of this once-recent work into Biopython?<br>
>> Are the SeqFeature CompoundLocation changes satisfactory and ready to merge<br>
>> into the mainline? Are we willing to make this compatibility break? If not,<br>
>> should we instead add another class/module to support the new behavior<br>
>> (BetterSeqFeature)?<br>
>><br>
>> Happy to help,<br>
>> Eric<br>
>> _______________________________________________<br>
>> Biopython-dev mailing list<br>
>> <a href="mailto:Biopython-dev@mailman.open-bio.org" target="_blank">Biopython-dev@mailman.open-bio.org</a><br>
>> <a href="http://mailman.open-bio.org/mailman/listinfo/biopython-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython-dev</a><br>
> _______________________________________________<br>
> Biopython-dev mailing list<br>
> <a href="mailto:Biopython-dev@mailman.open-bio.org" target="_blank">Biopython-dev@mailman.open-bio.org</a><br>
> <a href="http://mailman.open-bio.org/mailman/listinfo/biopython-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython-dev</a><br>
</div></div></blockquote></div><br></div></div>