[Biopython-dev] [biopython] added unstable Bam parser class (e6343eb)

Peter Cock p.j.a.cock at googlemail.com
Wed Apr 3 09:30:53 UTC 2013


On Wed, Apr 3, 2013 at 10:09 AM, Tiago Antão <tiagoantao at gmail.com> wrote:
>> One of the fun things to try would be a multi-threaded BGZF
>> parser which simply reads a few blocks ahead and delegates
>> block decompression to worker threads.
>
> Wouldn't the GIL bite here and deny any kind of advantage?
> (At least in CPython)

The BGZF code does the basic IO, reading in a compressed block
as a string, then passes that to the gzip/zlib library to decompress.
That happens in C, so could/should avoid the GIL. See also:
http://www.dalkescientific.com/writings/diary/archive/2012/01/19/concurrent.futures.html

Note that last time I looked at this, a year ago or so, PyPy was
quite slow calling zlib - passing large byte strings from PyPy
to C and back wasn't optimised. That may have improved. See
this thread (I didn't get deep enough into PyPy to fix this myself):
http://mail.python.org/pipermail/pypy-dev/2012-March/009623.html

Peter




More information about the Biopython-dev mailing list