[Biopython] Overhauling of Bio.PDB module

João Rodrigues j.p.g.l.m.rodrigues at gmail.com
Tue Oct 22 23:25:35 UTC 2019


Hi all,

Michiel: agreed, I put out Bio.Structure as a placeholder for now. Good
point that it will invariably clash with a Structure class in there.
Bio.structures is a good suggestion, I endorse the lowercase convention too!

FYI: This is an invite link to the Slack community where I created a
#biopython channel. I don't want it to serve as an 'official' channel of
communication, but I figure since we lost the '-dev' mailing list it's
better if we can have some space to discuss this type of issues a bit more
thoroughly without spamming the whole lot of you. Once we reach some
consensus, we can update this post. Having said that, if you like 3D
structures, static, dynamic, or frozen, feel free to join!

https://join.slack.com/t/3dsig-cosi/shared_invite/enQtODA3MTM5OTE1OTU5LTdhZDMzZTY0MTBjYjk5NWIyZGEwMjZjZTQzNTdmNjhmNjk0OGI5NDQ2M2RiM2VlYTk1ZDY4Nzc5ZmU0MGIzYjM

Cheers,

João


Michiel de Hoon <mjldehoon at yahoo.com> escreveu no dia segunda, 21/10/2019
à(s) 03:41:

> Very much in favor of overhauling Bio.PDB. This is also a good opportunity
> to make the code organization more consistent with the rest of Biopython,
> for example by having "read" and "parse" functions instead of having to
> create parser objects first.
>
> I agree with João's idea of working under a new namespace. Since the new
> module will probably also contain a class named "Structure", I would
> suggest to give the module a different name, so we don't end up with the
> Structure class and a Structure module.
> To distinguish the two, you could use "structures" for the module, and
> "Structure" for the class. We did something similar when we overhauled the
> old Bio.Motif module, creating a new module Bio.motifs, with a class named
> "Motif" inside.
> Btw note that lowercase letters are recommended for module names in Python.
>
> Best,
> -Michiel
>
> On Thursday, October 17, 2019, 7:23:21 AM GMT+9, João Rodrigues <
> j.p.g.l.m.rodrigues at gmail.com> wrote:
>
>
> Hi Joe,
>
> IIRC from BOSC, my proposal was to work under a new namespace
> 'Bio.Structure' to avoid compatibility issues and, on the long term,
> deprecate Bio.PDB once all functionality had been rewritten.
>
> It would also be interesting to gauge what would be features people (users
> and developers) would like to see implemented/changed/fixed/removed.
>
> The old car analogy is perfect :)
>
> Cheers,
>
> Joao
>
> Joe Greener <jgreener at hotmail.co.uk> escreveu no dia quarta, 16/10/2019
> à(s) 15:08:
>
> Hi Patrick,
>
> Some of us spoke about this at CoFest too, inspired by the ideas in
> Biotite (I don't think you and I spoke at BOSC though). As I recall it was
> João, Spencer, myself and possibly Peter in the discussions.
>
> We were in favour of the fundamental idea of a large coordinate array that
> is indexed into. As you point out though it would be no small amount of
> work to implement. I personally won't have time to do it, though I am happy
> to discuss and review code.
>
> I view Bio.PDB like a beloved older car that has been patched up over many
> years. It is probably the most widely used and debugged PDB parsing code
> around, and any overhaul would have to make sure to maintain the behaviour
> that many people rely on. That said, it does have its peculiarities and is
> rather slow (https://github.com/jgreener64/pdb-benchmarks). I'm just
> saying that we should make sure to get consensus before merging any
> overhaul PRs. But for sure I am in favour of someone making those PRs.
>
> Best,
> Joe
>
> Joe Greener
> Research Associate, UCL
> http://jgreener64.github.io
>
>
> On 16/10/2019 12:37, Patrick Kunzmann wrote:
>
> Hello Biopythoneers,
>
> at the BOSC this year we talked about overhauling the Bio.PDB module. The
> problem is that currently the atom coordinates are stored in a separate
> NumPy array for each atom. This design prevents efficient computation of
> all kinds of analyses (distances, angles, superimpositions, etc.). One
> proposed possible solution to this problem, we talked about, was to put the
> coordinates of the entire structure in one NumPy array, and let the Atom,
> Residue, Chain and Structure objects point to positions in this array. The
> benefit of this approach is that functions could be directly applied onto
> the entire array, harnessing the power of vectorization.
>
> For the analysis we could adapt the vectorized functions from the Python
> package Biotite, a project I am currently working on (
> https://www.biotite-python.org/apidoc/biotite.structure.html). Usually,
> these functions already accept the coordinates as NumPy array, so I think
> only a few tweaks would be necessary for every function.
>
> However, we would require one person or a small team who makes the effort
> to implement the new structure types and adapts the analysis functions. I
> could offer a pair of helping hands in the adaption of the analysis
> functions, but I don't have the time for anything more.
>
> So the question is: Is there anyone out there, who is willing to do this
> work? Alternatively, I would propose to write a 'bridge' package between
> Biopython and Biotite, that converts the Biopython structure representation
> into the representation in Biotite and vice versa. I think, this solution
> is less elegant but would also require less effort.
>
> Best regards
>
> Patrick Kunzmann
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20191022/2ee41a9a/attachment.htm>


More information about the Biopython mailing list