Bioperl: 3D biomolecular structure handling for Bioperl

Steve Chervitz sac@neomorphic.com (Steve A. Chervitz)
Thu, 17 Dec 1998 17:46:41 -0800 (PST)


Andrew,

Thanks for the highly detailed response (just finished reading it ;). 
A few comments here and there:

Andrew Dalke writes:
 >   Should this structure information include methods relevant to small
 > molecule chemistry?  This includes bond order/type, cycle detection
 > and aromaticity.  The library I know best in this field is Daylight's
 > <http://www.daylight.com/> and I've got an "all but documented" Python
 > wrapper to it I can make available.  It provides an OO interface to
 > Daylight's C/Fortran style API.

We'd like to include small molecules, but the focus is really on
biological macromolecules. Small molecules should be included to the
extent that they interact with these macromolecules. So our focus here
is more on structural biology than cheminformatics.

 > ...
 > 
 >   I am becoming convinced there should be a core molecular
 > representation which is very lightweight (stores atom&bonds), and
 > "views" of the molecule which provide specific interfaces.  For
 > example from small molecules, some people like Kekule form and others
 > don't.  The Kekule view of a molecule would have bonds that are
 > single/double while the non-Kekule view would have "aromatic" bonds.

This is in-line with our thinking as well. But how best to represent
the data in the core format? XML maybe?

 > ...
 > 
 > > Provide methods for querying structure for statistics regarding
 > > atoms, hetatoms, connectivity, secondary structural elements,
 > 
 > I'm using this point to bring up VMD
 > <http://www.ks.uiuc.edu/Research/vmd/>, the visualization software
 > I've help develop.  It has a Tcl based way to access a lot of this
 > information and might be of some help.  Of course, MMTK supports this
 > as do other programs.

Thanks for the pointer. I notice that the VMD Programmer's Guide
is version 1.1 draft dated 14 May 1996. How "off" is this from the
current release?

 > > Multi-chain and multi-model (NMR) structures should also be
 > > supported.
 > 
 >   Should dynamics/trajectory information be stored?  In the same
 > fashion as multi-model structures?  Should the bond information be
 > determined once and assumed to be invariant over the whole
 > conformation set or should it be recomputed for each conformation?
 > 
 >   I've pretty much decided that coordinates should be stored
 > independently from the atoms themselves.  It may be appropriate for
 > the atom to be able to access the "current" conformation somehow, but
 > I don't know the right way to do that in the face of multithreaded
 > code.
 > 
 > > presence of alternate conformations
 > 
 >   If someone can tell me a sane way to handle that, I'll be very
 > interested.  I don't know the right way to handle bond detection
 > (covalent and hydrogen) when the alt. id is used in a PDB structure.

These conformational and dynamic issues deserve special treatment. 
For some applications, the structure can be considered an immutable,
single-conformation entity. Perhaps there could be DynamicStructure
and StaticStructure objects, which could be inter-convertable.

 > ...
 >
 > In VMD, some of the residue information (like the residue id) is
 > located in the residue rather than the atom.  The atom contains an
 > association back to the residue so residue properties (like the
 > residue id) are available at the atom level.  This is easy in C/C++.
 > It's harder with Perl's memory management since a lot of cycles may
 > occur.  I don't know how to resolve that problem.  I've heard mention
 > of "weak references" in a couple other languages.  Seems relevant, but
 > I don't know what it means :)

I'm also worried about memory management in Perl when dealing with
potentially thousands of atom and residue objects where each one is
directly or indirectly connected to every other one. You can break
cycles, but Perl can be reluctant to free an object's memory. 
I don't want to get to the point where memory efficiency in Perl is
the guiding design principle of the structure object.


 > > Display/select atoms or residues based on physical-chemical
 > > properties: acidic, basic, polar, non-polar amino acids, spcific
 > > types of residues or atoms (e.g., oxygen, glycine), hydrogen bonds,
 > > disulfide bonds, etc.
 > 
 > Display should not be the property of a library like this because you
 > may then restrict certain types of user interaction..  I point out a
 > difference between VMD and most other visualization programs.  Most
 > programs have the concept of an "active" selection.  All operations are
 > based with respect to that selection.  VMD, on the other hand, lets
 > you have multiple selections, each with its own independent method of
 > display.  I've found the latter very useful, especially when you want
 > to switch between several different selections.
 >
 > > III.Analyzing & Editing Structures
 > 
 >   How tied should the implementation be to the existing Perl math/array
 > modules?  I don't know the status of those modules, but it's a huge
 > advantage to be able to use existing methods to compute vector
 > operations, matrix transforms and eigenvalues/vectors.  Especially if
 > there are implementations of some of those operations in C :)

We don't intend for this project to include the displaying of
structures or to determine how to implement analysis methods. At this
stage we're trying to delineate the ways in which structural data are
used. This will help define requirements/constraints on the
structure object and an interface for accessing structural information. 

Sure, exploiting the existing vector/matrix math packages for Perl,
such as PDL (http://pdl.perl.org/) would be useful.

 > > select an arbitrary set of residues on a structure
 > 
 >   Could this be regarded as a sequence based "view" (with the
 > association I gave before) of an atom selection?

Sure.

 > > Get related structures and domains based on criteria such as
 > > sequence similarity, structural similarity, date deposited,
 > > resolution, organism name, molecule name, etc.
 > 
 > Yeww, nasty!  :) Last I heard about 20% of the PDB records weren't
 > fully parsable.  I do have a syntactic parser for each format card
 > defined in the PDB, but someone else will have to put the next level
 > of semantics about that.

Again, this is less a requirement for the structure object than it is
a description of a way that structures are used (or that we would
*like* to use them). We don't necessarily have to stick to what is
obtainable from PDB records.

 > > Get URLs for web-based information about the structure.
 > 
 >   This is mostly for my own curiousity.  How stable are the web
 > services that provide this sort of information?  From past experience
 > only a few stable (ftp sites are usually stable but http sites,
 > esp. with fancy CGIs for user interface are not).  We would have to
 > update our parsers for some 60-odd sites about once a week to reflect
 > changes on the remote site.

It should be possible to at least cover a handful of the major sites,
which would handle most user's needs. I can imagine a sort of
selection process occurring for sites with the most stable URLs, or at
least the most robust mechanisms for handling changes to their
API. (Perhaps this would be a good research area for developing a
structure-savvy intelligent web agent?).

 >   Here's a challange to anyone working on the structure implementation.
 > Can you implement these to allow non-trivial multithreading?  (An
 > example of a trivial solution is to serialize all requests through a
 > single thread.)

Gotta walk before you can run...
But true, you've gotta know where you're heading.

 >   Now that *that's* done, let me address this part of the email.
 > 
 > >  - In what sort of program would you use such a module?
 > >  - How would you want to access the data, and what methods should be
 > >     available?
 > 
 >   Well, I'm a whole-hearted Python developer these days, so it won't
 > really be me.  However, here's some generally useful tasks (tasks
 > meaning they aren't quite full fledged programs):
 > ...
 >
 >   Editor -- My first real perl program was "pdblang," which you can
 > still get from
 > ftp://ftp.ks.uiuc.edu/pub/group/dalke/pdblang-1.0.tar.gz (it may need
 > ...

Thanks for the usage descriptions and link to your pdblang
program. Did you manage to embed it inside VMD as you claimed to be
working on in the comments?

SteveC


=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================