[Biopython-dev] Fwd: [biopython] Fix broken downloading of large PDB structures (#146)

Peter Cock p.j.a.cock at googlemail.com
Wed Jan 9 23:55:13 UTC 2013


FYI

---------- Forwarded message ----------
From: David Cain <notifications at github.com>
Date: Wed, Jan 9, 2013 at 10:59 PM
Subject: [biopython] Fix broken downloading of large PDB structures (#146)
To: biopython/biopython <biopython at noreply.github.com>


Summary of changes

   - Fix failure to download large PDB files
   - Use with statements for safer file I/O
   - Remove obsolete parameters
   - PEP 8 changes, update documentation

Failure to download large PDB files

(See: Redmine bug #3403 <https://redmine.open-bio.org/issues/3403>)

The current PDBList module will often fail to download large PDB files.

>>> from Bio.PDB import PDBList
>>> pdbl = PDBList()
>>> pdbl.retrieve_pdb_file("1hgg")
...
IOError: CRC check failed 0x21d7a5f7 != 0x4b5eabb6L
>>>

The source of this problem is that the entire gzipped file must be read
into memory before it's written to disk locally.

Instead of this memory-intensive approach, I changed the downloading to
use urllib.urlretrieve, which is more readable and far more efficient.
Obsolete parameters

The long-obsolete parameters to retrieve_pdb_file(() have been
removed. Formerly, the function allowed the user to specify compression
and/or a system utility to perform decompression. But all archives are
now gzipped, and PDBList uses Python's gzip module to decompress
archives. These parameters have been obsolete for over a year (they were
marked deprecated with commit
7ebf6e9<https://github.com/biopython/biopython/commit/7ebf6e9ecb>
).
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/DavidCain/biopython fix_pdb_dl

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/146
Commit Summary

   - Use urlretrieve to smartly download PDB archives
   - Use 'with' statement for safer file I/O
   - Collapse unwieldy if-else structure
   - PEP8 fixes within retrieve_pdb_file
   - Remove deprecated parameters
   - Update with clarifying comments
   - PEP8 fixes, updated comments for file
   - Use urlretrieve in other instance of save to disk

File Changes

   - *M* Bio/PDB/PDBList.py (217)

Patch Links:

   - https://github.com/biopython/biopython/pull/146.patch
   - https://github.com/biopython/biopython/pull/146.diff



More information about the Biopython-dev mailing list