Protein Clustering tool

Tue Jul 3 09:38:29 UTC 2001

Bonjour 

Firstly, Frank thank you for your reply.
I am sorry my first email was not enough precise.

In fact,
I was wondering if EMBOSS plan to provide a free clustering tool
with a view to get from a protein fasta sequence file
a list of family proteins. 

For instance, thanks to A. Enright & C. Ouzounis
GeneRage software is free for academic research
(http://www.ebi.ac.uk/research/cgg/services/rage/)
but the sources are not yet available

best regards
thank you for your help
F

PS: please reply to my email, fchetou at infobiogen.fr

> 
>   If you wish to construct phylogenetic trees (specifically gene trees)
> from protein sequences so as to infer duplication and
> paralogous/orthologous relationships, then you can use the PHYLIP
> package (available as an EMBASSY application).  Genetic distances can be
> calculated using EPROTDIST and the distance matrix created can be input
> into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
> more approximate clustering method, allowing the use of the
> Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
> if you have previously tested that the "molecular clock" assumption is
> valid for your dataset).
> 
>   ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
> package (http://evolution.genetics.washington.edu).  PHYLIP 3.6 has
> recently been released (alpha version).  However, PROTDIST 3.6 has
> improved distances (copes with among-site rate heterogeneity to give
> more accurate genetic distances) and there are also improvements to
> NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
> be available as an EMBASSY application once it is confident that there
> are no serious bugs :-)
> 
> I hope that helps,
> Best Wishes,
> Frank 
> -- 
> Frank Wright
> Biomathematics and Statistics Scotland, 
> SCRI, DUNDEE DD2 5DA, Scotland
> frank at bioss.sari.ac.uk