<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Jun 4, 2015 at 3:44 AM, Peter Cock <span dir="ltr">&lt;<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This would be great to have merged - pathological test cases<br>
and interconversion too :)<br>
<br>
Did we settle on a plan for parent/child relationships in<br>
SeqFeature objects (beyond deprecating sub_features<br>
which has been replaced with CompoundLocations)?<br>
<span><font color="#888888"><br>
Peter<br></font></span></blockquote><div><br></div><div>The last thread I see on this topic is from the end of summer 2012:<br><a href="http://mailman.open-bio.org/pipermail/biopython-dev/2012-July/018979.html" target="_blank">http://mailman.open-bio.org/pipermail/biopython-dev/2012-July/018979.html</a>  (thread)<br><a href="http://mailman.open-bio.org/pipermail/biopython-dev/2012-September/019101.html" target="_blank">http://mailman.open-bio.org/pipermail/biopython-dev/2012-September/019101.html</a>  (terminal)<br><br></div><div>I&#39;m a bit confused because the CompoundLocation class exists in Bio/SeqFeature.py, and git blame says it was written in late 2011 -- Peter&#39;s Time Machine in action? Does the f_loc5 branch modify the existing CompoundLocation class, then?<br></div><div><br></div><div>The threads above also mention a deprecation process. I suppose in order to begin that process we need to determine what we&#39;re deprecating in favor of, then apply the new functionality and trigger a DeprecationWarning from the old-and-tired sub_features attribute along with some shim to keep things working approximately the way they used to? <br><br>Even if a perfectly smooth transition isn&#39;t possible, I think it&#39;s worthwhile to make a gentle break to allow Biopython to correctly handle modern file formats for genomic features/annotations.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span><font color="#888888">
</font></span><div><div><br>
On Thu, Jun 4, 2015 at 10:54 AM, Brad Chapman &lt;<a href="mailto:chapmanb@50mail.com" target="_blank">chapmanb@50mail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Eric;<br>
&gt; Thanks for looking at this. +1 on getting Lenna&#39;s work in and I&#39;ll let<br>
&gt; her comment on that compared to the current state of VCF support in<br>
&gt; pysam and PyVCF. For GFF, I&#39;d actually rather see<br>
&gt; integration/collaboration with Ryan&#39;s gffutils:<br>
&gt;<br>
&gt; <a href="https://github.com/daler/gffutils" target="_blank">https://github.com/daler/gffutils</a><br>
&gt;<br>
&gt; It uses sqlite to organize the data and is much better engineered than<br>
&gt; my GFF work. He took all my pathological test cases and made them work,<br>
&gt; and it also has initial biopython integration:<br>
&gt;<br>
&gt; <a href="https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py" target="_blank">https://github.com/daler/gffutils/blob/master/gffutils/biopython_integration.py</a><br>
&gt;<br></div></div></blockquote><div><br></div><div>Ryan is a superstar. I see gffutils is MIT-licensed, too, so maybe we can just copy a relevant chunk of the code?<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
&gt; The main work would be to take some of the scripts in bcbio-gff that<br>
&gt; folks find useful, like the GFF/GenBank conversion through SeqIO, and<br>
&gt; port these over. This has been something I wanted to do for a while but<br>
&gt; never got done. What does everyone think?<br>
&gt; Brad<br></div></div></blockquote><div><br></div><div>These?:<br><a href="https://github.com/chapmanb/bcbb/tree/master/gff/Scripts/gff">https://github.com/chapmanb/bcbb/tree/master/gff/Scripts/gff</a><br><br></div><div>I like that plan. The main goal in my mind is to provide a sensible substrate in Biopython for integrating the &quot;tabix&quot; family of formats, using SeqFeature as a core object and making it a little more useful, rather than try to provide a full-featured environment or high-performance I/O. I think Lenna&#39;s work was headed in this direction, so I&#39;d also like to focus on merging that functionality and seeing what else falls out of it.<br><br></div><div>-Eric<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
&gt;<br>
&gt;&gt; Biopythoneers,<br>
&gt;&gt;<br>
&gt;&gt; I am interested in improving Biopython&#39;s support for genomic data, namely<br>
&gt;&gt; through merging the existing GFF3 and VCF branches.<br>
&gt;&gt;<br>
&gt;&gt; Where we last left off, Brad&#39;s GFF branch was available on a fork:<br>
&gt;&gt; <a href="http://biopython.org/wiki/GFF_Parsing" target="_blank">http://biopython.org/wiki/GFF_Parsing</a><br>
&gt;&gt; <a href="https://github.com/chapmanb/bcbb/tree/master/gff" target="_blank">https://github.com/chapmanb/bcbb/tree/master/gff</a><br>
&gt;&gt;<br>
&gt;&gt; When this branch was submitted to Biopython, in 2009 or so, there was a<br>
&gt;&gt; subtle conflict with the way nested annotations were represented as<br>
&gt;&gt; SeqFeatures in Biopython. Peter tested several possible resolutions to this<br>
&gt;&gt; issue on branches, the last of which appears to be f_loc5:<br>
&gt;&gt; <a href="https://github.com/peterjc/biopython/tree/f_loc5" target="_blank">https://github.com/peterjc/biopython/tree/f_loc5</a><br>
&gt;&gt;<br>
&gt;&gt; For GSoC 2012, Lenna developed a VCF parser and genomic coordinate mapper<br>
&gt;&gt; compatible with Peter&#39;s SeqFeature updates (actually the f_loc4 branch, I<br>
&gt;&gt; guess?) and Brad&#39;s GFF parser:<br>
&gt;&gt; <a href="http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants" target="_blank">http://biopython.org/wiki/Google_Summer_of_Code#Representation_and_manipulation_of_genomic_variants</a><br>
&gt;&gt; <a href="http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends" target="_blank">http://arklenna.tumblr.com/post/29808300789/and-the-summer-ends</a><br>
&gt;&gt; <a href="https://github.com/lennax/biopython/" target="_blank">https://github.com/lennax/biopython/</a><br>
&gt;&gt;<br>
&gt;&gt; What would it take to merge all of this once-recent work into Biopython?<br>
&gt;&gt; Are the SeqFeature CompoundLocation changes satisfactory and ready to merge<br>
&gt;&gt; into the mainline? Are we willing to make this compatibility break? If not,<br>
&gt;&gt; should we instead add another class/module to support the new behavior<br>
&gt;&gt; (BetterSeqFeature)?<br>
&gt;&gt;<br>
&gt;&gt; Happy to help,<br>
&gt;&gt; Eric<br>
&gt;&gt; _______________________________________________<br>
&gt;&gt; Biopython-dev mailing list<br>
&gt;&gt; <a href="mailto:Biopython-dev@mailman.open-bio.org" target="_blank">Biopython-dev@mailman.open-bio.org</a><br>
&gt;&gt; <a href="http://mailman.open-bio.org/mailman/listinfo/biopython-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython-dev</a><br>
&gt; _______________________________________________<br>
&gt; Biopython-dev mailing list<br>
&gt; <a href="mailto:Biopython-dev@mailman.open-bio.org" target="_blank">Biopython-dev@mailman.open-bio.org</a><br>
&gt; <a href="http://mailman.open-bio.org/mailman/listinfo/biopython-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython-dev</a><br>
</div></div></blockquote></div><br></div></div>