[Biopython-dev] Bio.Cluster

Michiel Jan Laurens de Hoon mdehoon at ims.u-tokyo.ac.jp
Fri Jul 18 03:46:40 EDT 2003


I have added an option to do hierarchical clustering based on the 
distance matrix directly. The new version in in Biopython's CVS. To 
apply hierarchical clustering to the gene expression data, use

treecluster(my_matrix, ...)

or

treecluster(data=my_matrix, ...)

To do hierarchical clustering on the distance matrix directly, use

treecluster(distancematrix=my_distance_matrix, ...)

where my_distance_matrix is a 2D Numpy array which is symmetric and has 
zeros on the diagonal (though the code does not check for it). This 
works for pairwise single-, maximum-, and average-linkage, but not for 
pairwise centroid-linkage, for which you would need the original gene 
expression data.

I had to make some modifications in the Python <-> C interface for this, 
which tends to be error prone. If you find any bugs, please let me know.

--Michiel.

Iddo Friedberg wrote:

> Dear Michiel,
> 
> I just looked at the manual for Bio.Cluster (very well written, BTW). Is 
> there a way to do a k-means clustering (or other) based on a distance 
> matrix, rather than on the gene expression vector data? The data i am 
> trying to cluster teh structural similarity of protein structure 
> fragments, and as such already appears in the matrix form.
> 
> Thanks,
> 
> ./I
> 
> 
> 
> Michiel Jan Laurens de Hoon wrote:
> 
>> Dear biopython developers,
>>
>> I have added Bio.Cluster to the Biopython CVS. Bio.Cluster contains 
>> clustering techniques for gene expression data (hierarchical, k-means, 
>> and SOMs); most routines are written in C with a Python wrapper. This 
>> package also exists separately as Pycluster.
>>
>> The Python and C source code is in Bio/Cluster; I have also added 
>> Bio.Cluster to setup.py.
>>
>> In case you want to try this package, there is a manual at
>> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/cluster.pdf
>> (replace "from Pycluster import *" by "from Bio.Cluster import *") and 
>> a sample data set at
>> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/demo.txt.
>> Please let me know if you find any problems with this package.
>>
>> --Michiel.
>>
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon




More information about the Biopython-dev mailing list