Michiel Jan Laurens de Hoon
mdehoon at ims.u-tokyo.ac.jp
Fri Jul 18 03:46:40 EDT 2003
I have added an option to do hierarchical clustering based on the
distance matrix directly. The new version in in Biopython's CVS. To
apply hierarchical clustering to the gene expression data, use
To do hierarchical clustering on the distance matrix directly, use
where my_distance_matrix is a 2D Numpy array which is symmetric and has
zeros on the diagonal (though the code does not check for it). This
works for pairwise single-, maximum-, and average-linkage, but not for
pairwise centroid-linkage, for which you would need the original gene
I had to make some modifications in the Python <-> C interface for this,
which tends to be error prone. If you find any bugs, please let me know.
Iddo Friedberg wrote:
> Dear Michiel,
> I just looked at the manual for Bio.Cluster (very well written, BTW). Is
> there a way to do a k-means clustering (or other) based on a distance
> matrix, rather than on the gene expression vector data? The data i am
> trying to cluster teh structural similarity of protein structure
> fragments, and as such already appears in the matrix form.
> Michiel Jan Laurens de Hoon wrote:
>> Dear biopython developers,
>> I have added Bio.Cluster to the Biopython CVS. Bio.Cluster contains
>> clustering techniques for gene expression data (hierarchical, k-means,
>> and SOMs); most routines are written in C with a Python wrapper. This
>> package also exists separately as Pycluster.
>> The Python and C source code is in Bio/Cluster; I have also added
>> Bio.Cluster to setup.py.
>> In case you want to try this package, there is a manual at
>> (replace "from Pycluster import *" by "from Bio.Cluster import *") and
>> a sample data set at
>> Please let me know if you find any problems with this package.
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
More information about the Biopython-dev