[Biopython] RFC on WIP - internal coordinates, structure / prediction folk please read

Tue Apr 9 18:24:48 UTC 2019

Hi João,

Sorry, no publication and not much documentation :-)   The latter I can
add, for the former (a publication) it always seemed more like bookkeeping
than science.  On the other hand a publication might have raised awareness
of the lua version of this code that's been sitting on Github for 3 years
now.

The approach is as follows:

- A hedron is a 3-atom unit defined by 2 bond lengths and an angle
between.  A dihedron is two hedra with one overlapping bond length and a
specified dihedral angle.
- generate lists of all hedra and dihedra making up protein backbone and
sidechains (like psi is N-CA-C-N+1. etc.).  Plan the lists so they overlap
in the right way to extend atom chains for the assembly step.
- read through the pdb structure and form the hedra and dihedra objects for
each residue from the atom coordinates; residue i will include the hedra
and dihedra which extend into residue i+1
- at the beginning of the chain and any chain breaks (determined by peptide
C-N bond exceeding a cut-off), store the X, Y, Z coordinates for the N, CA
and C atoms to restart the chain.  We lose atoms in the case of chain
fragments that don't start with N-CA-C.  (Storing these is what links the
internal coordinates to the coordinate space of the PDB file)
- at this point we have the internal coordinate set.  Re-writing the code
to work with Biopython (this is at least my 3rd iteration since grad
school) got me to better support hydrogens and altlocs, so the .pic
datafile captures Biopython disordered residues/atoms including occupancy,
plus B-factors.   The B-factors are definitely just tacked on at the end
for each residue, but it does make for complete ATOM records when
regenerating the PDB data.
- the assembly process uses a deque (double-ended queue) loaded up with
initial hedra containing atom coordinates. For all the dihedra that a
hedron from the queue starts, work out the coordinates for the 4th atom,
and put the 2nd hedron at the back of the queue. This way we work though
the correctly formed list above, and is explained in more detail in the doc
at the start of PIC_Residue.assemble().

The -hard- part is handling (or at least not crashing on) the altlocs,
disordered atoms, etc.

Rob.

On Tue, Apr 9, 2019 at 5:39 PM João Rodrigues <j.p.g.l.m.rodrigues at gmail.com>
wrote:

> Hi Rob,
>
> This is an extremely interesting contribution. Thanks for the time you
> took to write this. It would be very useful to have a conversion from
> cartesian to internal coordinates, as far as I know, there isn't a tool
> that does that (they usually do one way or the other). What is the
> convention you use for the conversion? How do you handle say, chain breaks
> or gaps? Is there a publication or small write-up you can provide about
> this (maybe it's in the documentation and I missed it)?
>
> Cheers,
>
> João
>
> rob miller <rob.miller.gh at gmail.com> escreveu no dia terça, 9/04/2019
> à(s) 06:09:
>
>> Hi,
>>
>> This is a request for feedback on code I would like to contribute to
>> Biopython.  I want to do more cleaning, testing, polishing and documenting
>> before making a pull request, so don't worry yet :-).  This post is to
>> query whether there's sufficient interest to accept the facility into
>> Biopython when I'm ready, hopefully gather some positive feedback and
>> ideas, and to make it publicly available now as it's working for me for
>> most structures.
>>
>> This branch ( https://github.com/rob-miller/biopython/tree/rtm-pic )
>> adds infrastructure for internal coordinates under a .pic attribute on
>> Bio.PDB Chain and Residue objects.  'Internal coordinates' means phi, psi,
>> omega, chi<X> dihedral angles, all bond angles and bond lengths.  Internal
>> coordinates can be read from a PDB structure and used to regenerate
>> identical coordinate PDB chains (HETATMs not withstanding, although there
>> is some support).
>>
>> While my primary application is to support structure prediction work,
>> there are some useful side effects.  Probably most interesting is the
>> ability to generate OpenSCAD files to 3D print protein structure models, as
>> it uses the same algorithm for assembly of bond length, angle and dihedral
>> angle data.  (Please be aware that the initial OpenSCAD rendering is
>> reasonably quick, but the detailed rendering to generate an .stl file for
>> printing can take hours depending on your hardware.)  Of lesser note,
>> filtering options add support for removing Hydrogens from PDB structures,
>> and obviously one can make Ramachandran plots and database projects looking
>> at different subsets of chi rotamers and other aspects of protein
>> structure.
>>
>> I've made a gist at
>> https://gist.github.com/rob-miller/0be208b73fe2ab36fadeeef60831fc92
>> to access the basic functionality.  Hopefully this is easy to get working
>> in your hands, if you have a local pdb mirror there is a place to configure
>> access to it near the beginning of the script.
>>
>> If you are playing with OpenSCAD and your protein has chain breaks (or
>> you excised lysozyme from a GPCR), increase the -maxp cutoff in the gist
>> options to treat the gap as an extra long peptide bond.
>>
>> All development so far has been exclusively on python3, so yes more
>> versions to support.  I am aware of the related projects FragBuilder and
>> PeptideBuilder.
>>
>> I hope you like it; please be gentle with me.
>>
>> Rob.
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> https://mailman.open-bio.org/mailman/listinfo/biopython
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20190409/db144463/attachment-0001.htm>