[EMBOSS] clustalw vs. emma

Derek Gatherer d.gatherer at vir.gla.ac.uk
Wed Mar 8 09:30:13 UTC 2006


Morning all

Is there some unusual default being passed to emma?  For instance, 
here's emma with a vanilla set of parameters on a fairly well 
conserved set of proteins (bdlf4.fa):

yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto

  CLUSTAL W (1.83) Multiple Sequence Alignments

Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: AG876-BDLF4      225 aa
Sequence 2: B95-BDLF4        225 aa
Sequence 3: GD1-BDLF4        225 aa
Sequence 4: RLV-BDLF4        238 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  100
Sequences (1:3) Aligned. Score:  98
Sequences (1:4) Aligned. Score:  85
Sequences (2:3) Aligned. Score:  98
Sequences (2:4) Aligned. Score:  85
Sequences (3:4) Aligned. Score:  86
Guide tree        file created:   [00029986C]
Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Sequences:   2      Score:3770
Group 2: Sequences:   3      Score:3741
Group 3: Sequences:   4      Score:3462
Alignment Score 8058
GCG-Alignment file created      [00029986B]

and now clustalw, unwrapped in emma, with the same input file

yoda:cluscheck 158 > clustalw bdlf4.fa

  CLUSTAL W (1.83) Multiple Sequence Alignments

Sequence format is Pearson
Sequence 1: AG876-BDLF4      225 aa
Sequence 2: B95-BDLF4        225 aa
Sequence 3: GD1-BDLF4        225 aa
Sequence 4: RLV-BDLF4        238 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  100
Sequences (1:3) Aligned. Score:  98
Sequences (1:4) Aligned. Score:  88
Sequences (2:3) Aligned. Score:  98
Sequences (2:4) Aligned. Score:  88
Sequences (3:4) Aligned. Score:  88
Guide tree        file created:   [bdlf4.dnd]
Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Sequences:   2      Score:4959
Group 2: Sequences:   3      Score:4928
Group 3: Sequences:   4      Score:4677
Alignment Score 8187
CLUSTAL-Alignment file created  [bdlf4.aln]

Why is the scoring subtly different?  and see what it does to the 
N-terminal of the alignment....

First with emma:

            1                                               50
AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
B95-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
GD1-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
RLV-BDLF4   MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP

now with clustalw:

AG876-BDLF4      MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
B95-BDLF4        MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
GD1-BDLF4        MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
RLV-BDLF4        MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
                  ***:**:*              * ***..  **.********** *:*************

Clustalw alone clearly gives the correct alignment whereas emma is 
wrong.  I thought that emma simply wrapped clustalw for automation, 
but it appears it is doing something else.  Out of a set of 80 
proteins I am trying to pipeline through alignment, emma gives a 
variant result for 7 of them.....

Any thoughts, as always, much appreciated

cheers
Derek



More information about the EMBOSS mailing list