Protein Clustering tool

Frank Wright frank at
Tue Jul 3 09:18:20 UTC 2001

Hi All,

  If you wish to construct phylogenetic trees (specifically gene trees)
from protein sequences so as to infer duplication and
paralogous/orthologous relationships, then you can use the PHYLIP
package (available as an EMBASSY application).  Genetic distances can be
calculated using EPROTDIST and the distance matrix created can be input
into either EFITCH (slower, more accurate tree) or ENEIGHBOR (faster,
more approximate clustering method, allowing the use of the
Neighbor-Joining algorithm, or the UPGMA algorithm - use the latter only
if you have previously tested that the "molecular clock" assumption is
valid for your dataset).

  ePROTDIST, eFITCH and eNEIGHBOR come from version 3.5 of the PHYLIP
package (  PHYLIP 3.6 has
recently been released (alpha version).  However, PROTDIST 3.6 has
improved distances (copes with among-site rate heterogeneity to give
more accurate genetic distances) and there are also improvements to
NEIGHBOR 3.6 (faster) and to FITCH 3.6.  I presume that PHYLIP 3.6 will
be available as an EMBASSY application once it is confident that there
are no serious bugs :-)

I hope that helps,
Best Wishes,
Frank Wright
Biomathematics and Statistics Scotland, 
SCRI, DUNDEE DD2 5DA, Scotland
frank at

More information about the EMBOSS mailing list