[Biopython-dev] GSoC 2011 - Interface analysis module - Week 1

Tue May 31 11:46:50 UTC 2011

Hi there,

As mentioned in the title, you will find in this email a sum up of my first
week of coding for the Google Summer of Code 2011. I will begin with a
reminder of the original plan proposed to Google and I will continue with
what I did and what obstacles I encountered. Please don't hesitate to post
some comments, your remarks are one of the main motivation for this mail
(which will be I think the first one of a weekly report) !

Week 1 [24th - 31th May]

   1. Add a the new Interface module backbone in current Bio.PDB code base

   1. Evaluate possible code reuse and call it into the new module
   2. Try simple calculations to be sure that there is stability between the
   different modules (parsing for example) and functions

   1. Define a stable benchmark of few PDB files of complexes to run some
   unit tests for each step of the project

Unfortunately, one of the main part of my first week was to try to solve
some troubles I had by using github directly on my Dropbox folder. I worked
on several computer so I wanted to have everything synchronized, but this
synchronization didn't seem to be very compatible with dropbox. I have to
say that it was certainly the way I used it which were wrong, I decided
finally (but also lately) to keep only one main working directory and to ssh
it if I need.

We began to think of an easy way to add the Interface as a new part of the
SMCRA scheme. The idea was to have this new scheme = SM-I-CRA. Unfortunately
the Interface object is not as well defined as just a child of model and a
parent of chains. Indeed, the main part of the interface is residues, and
even residues pairs. We want to keep the information of the chain but we
can't keep them as they are defined actually, since we will get some
overlaps, duplication and miscompatibility between the chains of our model
and the chains of our interface. In the same way, our try to link the
creation of the interface with existing modules as StructureBuilder and
Model wasn't successful.
So, we decided to simplify a bit the concept in adding the classes related
to the Interface in an independent way. Obviously links will exist between
the different levels of SMCRA but Interface would be considered now as a
parallel entity, not integrated completely in the SMCRA scheme.
End of the story, some keyboards uses now.

About the coding part. I had two new classes in Bio.PDB : Interface.py and
InterfaceBuilder.py

For the impatient people, this is the two links of my commits :
https://github.com/mtrellet/biopython/commit/4cfa4359d0f927609c076ed7b66f37add5aabdfb
https://github.com/mtrellet/biopython/commit/194efe37ac8f88d688e0cf528f1fb896c8441866

Interface.py is the definition of the Interface object inherited from Entity
with the following methods : *__init__*(self, id),  *add*(self, entity) and
*get_chains*(self).

The *add* module overrides the add method of Entity in order to have an easy
way to class residues according to their respective chains.
The *get_chains* modules returns the chains involved in the interface
defined by the Interface object.

The second class created is InterfaceBuilder.py which deals directly with
the interface building (hard to guess..!)
We find these different modules : *__init__*(self, model, id=None,
threshold=5.0, include_waters=False, *chains),  *_unpack_chains*(self,
list_of_tuples),  *get_interface*(self),  *_add_residue*(self, residue),  *
_build_interface*(self, model, id, threshold, include_waters=False, *chains)

*__init__* : In order to initialize an interface you need to provide the
model for which you want to calculate the interface, that's the only
mandatory argument.
*_unpack_chains*: Method used by __init__ so as to create self.chain_list,
variable read in many parts of the class. It transforms a list of tuples
(given by the user) in a list of characters representing the chains which
will be involved in the definition of the interface.
*get_interface: *Returns simply the interface
*_add_residue: *Allows the user to add some specific residues to his
interface
*_build_interface: *The machinery to build the interface, it uses
NeighborSearch and Selection in order to define the interface depending on
the arguments given by the user.

It was maybe a bit long and with too many details (or perhaps not details
enough), as I already said, don't hesitate to make suggestions, for both my
work and my report ! You should receive a dozen of these, so any comment is
welcomed !

Cheers,

-- 
Mikael TRELLET,
Computational structural biology group, Utrecht University
Bijvoet Center,
The Netherlands