[Biopython] Reading large files, Biopython cookbook example
Andrew Dalke
dalke at dalkescientific.com
Tue Aug 6 18:49:35 UTC 2013
On Aug 6, 2013, at 11:35 AM, Peter Cock wrote:
> In the long run this problem should go away as the PDB moves
> to using the The PDBx/mmCIF format:
> http://www.wwpdb.org/news/news_2013.html#22-May-2013
Either you are optimistic or a ultra marathon runner! The
move over to mmCIF started of course 20 years ago, and that
link you gave said the change applies only to very large
structures:
Structures that do not exceed the limitations of the PDB
format will continue to be provided as PDB files in the
archive for the foreseeable future.
Even for large files, which previously would split the structure
over multiple records, there will be a "best-effort" PDB format,
available as a web service.
40 years of the PDB format => well-entrenched => not going to
get rid of it any time soon.
For another historical side-note, the PDB format started in
the early 1970s, but contains a kernel which is even older!
Quoting from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf :
In order to establish the PDB, acceptance by the crystallographic
community was necessary, requiring a pilgrimage in 1970 to the Medical
Research Council (MRC) laboratory and Crystal Data Centre (CDC) in
Cambridge. One result of this exchange was a concession that coordinates
of protein structures would be stored in the same format as the small
molecule CDC database (with a redundant ATOM label at the beginning of
each card), retaining the now-arcane counting number at the end. But the
idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond,
and colleagues in Cambridge.
The "now-arcane" counting number has long disappeared from the
spec. It was there, I believe, so that if the punch cards were
dropped then they could be resorted based on the last few columns.
(I imagine you could also write a program to strip out the
C-alpha cards, work with them, then merge the C-alphas back into
the card deck correctly.)
Andrew
dalke at dalkescientific.com
More information about the Biopython
mailing list