[Biopython-dev] BGZF support, was Re: Biopython 1.60 plans and beyond

Peter Cock p.j.a.cock at googlemail.com
Tue Apr 24 15:58:10 UTC 2012


On Fri, Apr 20, 2012 at 11:35 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> Peter,
>
> My colleague was writing some code using MafIndex and commented how long it
> took her to download, decompress and index the human multiz alignments from
> UCSC. It seems like it'd be great to keep the files compressed... perhaps if
> the code works well enough we can convince UCSC to host bgzip'd copies (or
> maybe them available on one of our institutions servers).

That does sound good - it is a perfect example of where BGZF is a more
useful alternative to standard GZIP. Some numbers on how much of a
size penalty it imposes would help though...

> Is I.J. interested in joining the community? I'd like to look into adding
> BGZF to MafIO and wouldn't want to duplicate I.J.'s effort. If not, could
> you put me in touch?

Perhaps he's just busy at the moment (BCC'd again)?

It should be easy enough to follow the BGZF changes to Bio/SeqIO/_index.py
and I'm willing to do this myself for MAF (while going over your index work -
something I want to do anyway). The only potential catch is avoiding offset
arithmetic.

Peter



More information about the Biopython-dev mailing list