[EMBOSS] clustalw vs. emma
Derek Gatherer
d.gatherer at vir.gla.ac.uk
Wed Mar 8 09:30:13 UTC 2006
Morning all
Is there some unusual default being passed to emma? For instance,
here's emma with a vanilla set of parameters on a fairly well
conserved set of proteins (bdlf4.fa):
yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto
CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: AG876-BDLF4 225 aa
Sequence 2: B95-BDLF4 225 aa
Sequence 3: GD1-BDLF4 225 aa
Sequence 4: RLV-BDLF4 238 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 100
Sequences (1:3) Aligned. Score: 98
Sequences (1:4) Aligned. Score: 85
Sequences (2:3) Aligned. Score: 98
Sequences (2:4) Aligned. Score: 85
Sequences (3:4) Aligned. Score: 86
Guide tree file created: [00029986C]
Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Sequences: 2 Score:3770
Group 2: Sequences: 3 Score:3741
Group 3: Sequences: 4 Score:3462
Alignment Score 8058
GCG-Alignment file created [00029986B]
and now clustalw, unwrapped in emma, with the same input file
yoda:cluscheck 158 > clustalw bdlf4.fa
CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Pearson
Sequence 1: AG876-BDLF4 225 aa
Sequence 2: B95-BDLF4 225 aa
Sequence 3: GD1-BDLF4 225 aa
Sequence 4: RLV-BDLF4 238 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 100
Sequences (1:3) Aligned. Score: 98
Sequences (1:4) Aligned. Score: 88
Sequences (2:3) Aligned. Score: 98
Sequences (2:4) Aligned. Score: 88
Sequences (3:4) Aligned. Score: 88
Guide tree file created: [bdlf4.dnd]
Start of Multiple Alignment
There are 3 groups
Aligning...
Group 1: Sequences: 2 Score:4959
Group 2: Sequences: 3 Score:4928
Group 3: Sequences: 4 Score:4677
Alignment Score 8187
CLUSTAL-Alignment file created [bdlf4.aln]
Why is the scoring subtly different? and see what it does to the
N-terminal of the alignment....
First with emma:
1 50
AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP
now with clustalw:
AG876-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
B95-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
GD1-BDLF4 MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
***:**:* * ***.. **.********** *:*************
Clustalw alone clearly gives the correct alignment whereas emma is
wrong. I thought that emma simply wrapped clustalw for automation,
but it appears it is doing something else. Out of a set of 80
proteins I am trying to pipeline through alignment, emma gives a
variant result for 7 of them.....
Any thoughts, as always, much appreciated
cheers
Derek
More information about the EMBOSS
mailing list