[BioRuby] BGZF support, was Re: Biopython 1.60 plans and beyond

Clayton Wheeler cswh at umich.edu
Thu May 24 01:35:46 UTC 2012


On May 22, 2012, at 7:07 AM, Peter Cock wrote:

> Hi all,
> 
> I've CC'd the BioRuby mailing list just to ensure you're aware of the
> potentially useful combination of MAF indexing and BGZF compression.
> We can continue this on the BioRuby list if more appropriate.
> 
> The start of this Biopython-dev thread is here:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-April/009561.html
> 
> This might be a nice opportunity to combine the work of this year's OBF
> Google Summer of Code students - Clayton is doing MAF for BioRuby,
> and part of Artem's project could provide BGZF support for BioRuby.

Indeed, thanks Peter. BGZF sounds like a great approach for MAF compression; I'm just about to start looking into indexing support, and it makes sense to tackle compression in that context.

So far, I think Artem's BGZF implementation is entirely in D; I may just add Ruby support for BGZF separately.

> On Fri, Apr 27, 2012 at 8:57 PM, Andrew Sczesnak
> <andrew.sczesnak at med.nyu.edu> wrote:
>> Peter,
>> 
>>> It should be easy enough to follow the BGZF changes to Bio/SeqIO/_index.py
>>> and I'm willing to do this myself for MAF (while going over your index
>>> work - something I want to do anyway). The only potential catch is
>>> avoiding offset arithmetic.
>> 
>> I have no problem with you doing this if you're willing. It would be great
>> to have some code review of MafIndex as well.
> 
> I'm not sure if Clayton will be able to comment on the Python code,
> but he should have some thoughts on the MAF indexing itself.

I'll definitely be spending more time with that code; it and the bx-python MAF indexing code will be my main reference points for indexed access. It's been a little while, but I have done some Python work in the past, so I should be able to follow along okay. I'll send some comments out in a few days.

Clayton Wheeler
cswh at umich.edu






More information about the BioRuby mailing list