[Biopython] Reading large files, Biopython cookbook example
Sampson, Jared
Jared.Sampson at nyumc.org
Tue Aug 6 20:10:25 UTC 2013
For the curious, there has been a conversation on the CCP4 Bulletin Board over the past few days addressing exactly this topic. The takeaway message is essentially what Andrew has mentioned: PDB format is here for the foreseeable future.
http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg32321.html
Cheers,
Jared
--
Jared Sampson
Xiangpeng Kong Lab
NYU Langone Medical Center
Old Public Health Building, Room 610
341 East 25th Street
New York, NY 10016
212-263-7898
http://kong.med.nyu.edu/
On Aug 6, 2013, at 2:49 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
On Aug 6, 2013, at 11:35 AM, Peter Cock wrote:
In the long run this problem should go away as the PDB moves
to using the The PDBx/mmCIF format:
http://www.wwpdb.org/news/news_2013.html#22-May-2013
Either you are optimistic or a ultra marathon runner! The
move over to mmCIF started of course 20 years ago, and that
link you gave said the change applies only to very large
structures:
Structures that do not exceed the limitations of the PDB
format will continue to be provided as PDB files in the
archive for the foreseeable future.
Even for large files, which previously would split the structure
over multiple records, there will be a "best-effort" PDB format,
available as a web service.
40 years of the PDB format => well-entrenched => not going to
get rid of it any time soon.
For another historical side-note, the PDB format started in
the early 1970s, but contains a kernel which is even older!
Quoting from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf :
In order to establish the PDB, acceptance by the crystallographic
community was necessary, requiring a pilgrimage in 1970 to the Medical
Research Council (MRC) laboratory and Crystal Data Centre (CDC) in
Cambridge. One result of this exchange was a concession that coordinates
of protein structures would be stored in the same format as the small
molecule CDC database (with a redundant ATOM label at the beginning of
each card), retaining the now-arcane counting number at the end. But the
idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond,
and colleagues in Cambridge.
The "now-arcane" counting number has long disappeared from the
spec. It was there, I believe, so that if the punch cards were
dropped then they could be resorted based on the last few columns.
(I imagine you could also write a program to strip out the
C-alpha cards, work with them, then merge the C-alphas back into
the card deck correctly.)
Andrew
dalke at dalkescientific.com
_______________________________________________
Biopython mailing list - Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list