[Biopython] Reading large files, Biopython cookbook example

João Rodrigues anaryin at gmail.com
Tue Aug 6 20:46:17 UTC 2013


Really nice discussion Jared, thanks for sharing.


2013/8/6 Sampson, Jared <Jared.Sampson at nyumc.org>

> For the curious, there has been a conversation on the CCP4 Bulletin Board
> over the past few days addressing exactly this topic.  The takeaway message
> is essentially what Andrew has mentioned: PDB format is here for the
> foreseeable future.
>
> http://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg32321.html
>
> Cheers,
> Jared
>
> --
> Jared Sampson
> Xiangpeng Kong Lab
> NYU Langone Medical Center
> Old Public Health Building, Room 610
> 341 East 25th Street
> New York, NY 10016
> 212-263-7898
> http://kong.med.nyu.edu/
>
>
>
>
> On Aug 6, 2013, at 2:49 PM, Andrew Dalke <dalke at dalkescientific.com>
> wrote:
>
> On Aug 6, 2013, at 11:35 AM, Peter Cock wrote:
> In the long run this problem should go away as the PDB moves
> to using the The PDBx/mmCIF  format:
> http://www.wwpdb.org/news/news_2013.html#22-May-2013
>
> Either you are optimistic or a ultra marathon runner! The
> move over to mmCIF started of course 20 years ago, and that
> link you gave said the change applies only to very large
> structures:
>
>    Structures that do not exceed the limitations of the PDB
>    format will continue to be provided as PDB files in the
>    archive for the foreseeable future.
>
> Even for large files, which previously would split the structure
> over multiple records, there will be a "best-effort" PDB format,
> available as a web service.
>
>
> 40 years of the PDB format => well-entrenched => not going to
> get rid of it any time soon.
>
>
>
> For another historical side-note, the PDB format started in
> the early 1970s, but contains a kernel which is even older!
> Quoting from
>
>  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2143743/pdf/9232661.pdf :
>
>  In order to establish the PDB, acceptance by the crystallographic
>  community was necessary, requiring a pilgrimage in 1970 to the Medical
>  Research Council (MRC) laboratory and Crystal Data Centre (CDC) in
>  Cambridge. One result of this exchange was a concession that coordinates
>  of protein structures would be stored in the same format as the small
>  molecule CDC database (with a redundant ATOM label at the beginning of
>  each card), retaining the now-arcane counting number at the end. But the
>  idea of a PDB was accepted by Professors Pemtz, Blow, Kennard, Diamond,
>  and colleagues in Cambridge.
>
> The "now-arcane" counting number has long disappeared from the
> spec. It was there, I believe, so that if the punch cards were
> dropped then they could be resorted based on the last few columns.
> (I imagine you could also write a program to strip out the
> C-alpha cards, work with them, then merge the C-alphas back into
> the card deck correctly.)
>
> Andrew
> dalke at dalkescientific.com
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list