[BioPython] Spatial clustering
Shu-Hsien Sheu
ssheu at post.harvard.edu
Tue Oct 14 11:16:00 EDT 2003
Dear all,
thanks for all the inputs!
I am new to this field and came from a bio background so I am not that
familiar with computer sciences. The project, however, was there for 1
year and had shown great results for some enzymes we tested:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14499612&dopt=Abstract
The basic idea is to use organic solvents as "probes" and use energy
function to find the favorable minimums. We first used a simplex method
with Van der Waals cancellation and then do the further minimization
using CHARMm. Through some testing we've found a 6660 positions of the
probes would bring the best results. By clustering those molecules and
calculating the average free energy for each we can come up with top 5
energy favorable clusters. It was shown that the "concensus" site of the
clusters of different probes is the binding site of the protein.
Actually the cluster code is already there and is written in C. The
person who wrote both the mapping program and the clustering program had
already left this lab. Originally I was working on the concensus site
finding part, which was done by manual inspection in Rasmal or PyMol in
the past, but later thought that it might be more efficient if I wrap
these two parts together. To me creating a valid RMSD matrix seems to be
as important as the algorithym for clustering. For instance, the small
molecules we used ranges from methanol to t-butanyl, and for the later
two reference points might be needed. Finding the consensus sight might
have more problems, since you are then dealing with different kinds of
molecules. Any comments here?
Clustering seems to be an important issue when doing molecular
modelling. People working on protein-protein docking in this lab all
have some efforts in this though no collaborationg or a uniform method
have been developed yet.
I have a naive questions about array/matrixes. Pairwise RMSD doesn't
have direction, e.g. RMSD(1,2) == RMSD(2,1).
Therefore, the distance matrix would look like this:
1 2 3 4 5
1 X .2 .1 1.2 3.4
2 .2 X .5 .2 .4
3 .1 .5 X .6 .7
4 1.2 .2 .6 X .2
5 3.4 .4 .7 .2 X
I've read the Numarray tutorial and there seems no special functions for
matrixes that's symmetrical on the diagnol. Any more efficient approaches?
An algorithy in my mind is, starting with the RMSD matrix, first I would
find the one with most neighbors, make it the hub of the cluster and
take it out along with its memeber, then do the same thing recursively.
Dear Iddo,
I just checked cluto and would try to find if it's good for my purpose.
thanks!
Dear Andrew,
I am not familiar with fingerprints or shape fiitting. Can you give me a
place for start? I will search through google as well. I am not familiar
with pharmacophore and will check it as well.
Dear Michiel,
I've read the PyCluster document and it seems that I had missed the
point that the treecluster can let me specify the distance matrix
myself. It might be the easiest solution. Thanks!
-shuhsien
More information about the BioPython
mailing list