[Biopython-dev] SeqIO.index improvement suggestions

Renato Alves rjalves at igc.gulbenkian.pt
Sat Dec 19 21:48:10 UTC 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Well, i hadn't been thinking about gzipped files (or any archives).
> How does gzip behave with memory use? I assume it doesn't
> load everything into RAM, but does allow you random access
> (seek and tell).

- From what I can tell, in terms of RAM it behaves the same way as a
normal open() it only decompresses the segments as they are accessed but
doesn't cache them. A reasonable trade-off between space and access time.

> This is a vague idea (which I haven't tried yet), but maybe the
> Bio.SeqIO.index() function could take an optional argument
> (gzip=True, or something more general like archive=...) which
> would cause the file to be opened via the gzip module instead?

I thought about something similar but using a combination of extension
of the file and magic (or actually python-magic[1]). The first one is
potentially messy although it's how things are mostly done in Windows.
The second one I couldn't confirm if is available for Windows but is
widely present in Linux (and I suppose MacOS too).
In the end I dislike the idea of 'having' to use one approach or the
other depending on the OS the code is running on, however this would fit
in without breaking any compatibility with current code.

1 - http://pypi.python.org/pypi/python-magic/0.1

Renato
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkstShgACgkQYh11EUYTX9Tu3wCglh6d3rt/ANU5J45bsceqcQ78
TQ0AnjgIlNhYRMqdzl4jBGYOPdMKOY7D
=rqsi
-----END PGP SIGNATURE-----



More information about the Biopython-dev mailing list