[Biopython] Overhauling of Bio.PDB module

João Rodrigues j.p.g.l.m.rodrigues at gmail.com
Thu Oct 17 07:14:08 UTC 2019


Hi John,

Thank you for the update, we will keep that info in mind. It would be great
if formats didn't come and go so quickly :) MMTF was released not 2 years
ago and it's been adopted quite a lot by the community.

Cheers,

João

John Berrisford <jmb at ebi.ac.uk> escreveu no dia quinta, 17/10/2019 à(s)
00:05:

> Hi
>
>
>
> It’s great to hear that you are updating the biopython’s PDB module.
>
>
>
> Just a reminder – PDB files are considered legacy format by the wwPDB, the
> primary format is mmCIF. There are an increasing number of PDB entries
> which do not have a PDB format file. So, if you are fetching files from the
> wwPDB FTP you should be getting the mmCIF format file.
>
>
>
> Also, MMTF will be replaced by binary CIF in the not too distant future
>
> https://github.com/dsehnal/BinaryCIF
>
>
>
> Binary CIF will be used by RCSB’s and PDBe’s new viewer Mol*(
> https://molstar.org/) and will be served by both RCSB and PDBe’s
> coordinate servers
>
> https://www.ebi.ac.uk/pdbe/coordinates/index.html
>
>
>
> Regards
>
>
>
> John
>
>
>
> *From:* Biopython <biopython-bounces+jmb=ebi.ac.uk at mailman.open-bio.org> *On
> Behalf Of *Joe Greener
> *Sent:* 16 October 2019 23:23
> *To:* biopython at biopython.org
> *Subject:* Re: [Biopython] Overhauling of Bio.PDB module
>
>
>
> Hi João,
>
> I hadn't seen your reply when I wrote mine (spam filters, grr) but it
> appears we are broadly in agreement.
>
> I agree that Bio.PDB's USP is its general parsing and structure handling
> functionality. I guess there is a "build it and they will come" argument
> for making the spatial stuff fast too.
>
> Long term Bio.Structure is probably a better name anyway as we now parse
> mmCIF and MMTF as well as PDB files. And it would allow us to sort out the
> unholy mess of imports and module/class name clashes that Bio.PDB has
> accumulated over the years.
>
> Best,
> Joe
>
> Joe Greener
> Research Associate, UCL
> http://jgreener64.github.io
>
>
>
> On 16/10/2019 18:14, João Rodrigues wrote:
>
> Hi Joe,
>
>
>
> IIRC from BOSC, my proposal was to work under a new namespace
> 'Bio.Structure' to avoid compatibility issues and, on the long term,
> deprecate Bio.PDB once all functionality had been rewritten.
>
>
>
> It would also be interesting to gauge what would be features people (users
> and developers) would like to see implemented/changed/fixed/removed.
>
>
>
> The old car analogy is perfect :)
>
>
>
> Cheers,
>
>
>
> Joao
>
>
>
> Joe Greener <jgreener at hotmail.co.uk> escreveu no dia quarta, 16/10/2019
> à(s) 15:08:
>
> Hi Patrick,
>
> Some of us spoke about this at CoFest too, inspired by the ideas in
> Biotite (I don't think you and I spoke at BOSC though). As I recall it was
> João, Spencer, myself and possibly Peter in the discussions.
>
> We were in favour of the fundamental idea of a large coordinate array that
> is indexed into. As you point out though it would be no small amount of
> work to implement. I personally won't have time to do it, though I am happy
> to discuss and review code.
>
> I view Bio.PDB like a beloved older car that has been patched up over many
> years. It is probably the most widely used and debugged PDB parsing code
> around, and any overhaul would have to make sure to maintain the behaviour
> that many people rely on. That said, it does have its peculiarities and is
> rather slow (https://github.com/jgreener64/pdb-benchmarks). I'm just
> saying that we should make sure to get consensus before merging any
> overhaul PRs. But for sure I am in favour of someone making those PRs.
>
> Best,
> Joe
>
> Joe Greener
> Research Associate, UCL
> http://jgreener64.github.io
>
>
>
> On 16/10/2019 12:37, Patrick Kunzmann wrote:
>
> Hello Biopythoneers,
>
> at the BOSC this year we talked about overhauling the Bio.PDB module. The
> problem is that currently the atom coordinates are stored in a separate
> NumPy array for each atom. This design prevents efficient computation of
> all kinds of analyses (distances, angles, superimpositions, etc.). One
> proposed possible solution to this problem, we talked about, was to put the
> coordinates of the entire structure in one NumPy array, and let the Atom,
> Residue, Chain and Structure objects point to positions in this array. The
> benefit of this approach is that functions could be directly applied onto
> the entire array, harnessing the power of vectorization.
>
> For the analysis we could adapt the vectorized functions from the Python
> package Biotite, a project I am currently working on (
> https://www.biotite-python.org/apidoc/biotite.structure.html). Usually,
> these functions already accept the coordinates as NumPy array, so I think
> only a few tweaks would be necessary for every function.
>
> However, we would require one person or a small team who makes the effort
> to implement the new structure types and adapts the analysis functions. I
> could offer a pair of helping hands in the adaption of the analysis
> functions, but I don't have the time for anything more.
>
> So the question is: Is there anyone out there, who is willing to do this
> work? Alternatively, I would propose to write a 'bridge' package between
> Biopython and Biotite, that converts the Biopython structure representation
> into the representation in Biotite and vice versa. I think, this solution
> is less elegant but would also require less effort.
>
> Best regards
>
> Patrick Kunzmann
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20191017/4049be54/attachment.htm>


More information about the Biopython mailing list