[EMBOSS] clustalw vs. emma

Thu Mar 9 16:18:55 UTC 2006

Thanks Alan

That indeed is the cause of the problem:

Executing 'clustalw -infile=00052348A -outfile=00052348B -align 
-type=protein -o
utput=gcg -pwmatrix=blosum -pwgapopen=10.000 -pwgapext=0.100 
-newtree=00052348C
-matrix=blosum -gapopen=10.000 -gapext=5.000 -gapdist=8 
-hgapresidues=GPSNDQEKR
-maxdiv=30'

However, on attempting to manually specify it, I run into another one:

[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 
bdlf4.emma -auto -debug -pwgapextend 5
Died: Unknown qualifier -pwgapextend

In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there 
are quite a few optional parameters of this sort, some of which work 
and others don't, eg:

[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 
bdlf4.emma -auto -debug -gapextend 5
Died: Unknown qualifier -gapextend
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 
bdlf4.emma -auto -debug -pwgapextend 5
Died: Unknown qualifier -pwgapextend
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 
bdlf4.emma -auto -debug -gapopen 5
Died: Unknown qualifier -gapopen
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2 
bdlf4.emma -auto -debug -gapdist 5

CLUSTAL W (1.83) Multiple Sequence Alignments

so -gapdist works at least.

Cheers
Derek

At 15:58 09/03/2006, ajb at ebi.ac.uk wrote:
>Hi Derek,
>
>emma is indeed just a wrapper for clustalw. You can see what default
>parameters it is using by specifying -debug on the command line
>and then looking at the emma.dbg file. Search for a line
>saying "Executing 'clustalw"
>
>I suspect that the default gap extension penalty is rather high
>in your case. If you use (e.g.) -gapext 0.2   then you'll get
>something approaching the default clustalw behaviour. The defaults
>for your sequences seem to be:
>
>   -gapopen=10.000 -gapext=5.000 -gapdist=8
>
>
>HTH
>
>Alan
>
> > Morning all
> >
> > Is there some unusual default being passed to emma?  For instance,
> > here's emma with a vanilla set of parameters on a fairly well
> > conserved set of proteins (bdlf4.fa):
> >
> > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto
> >
> >   CLUSTAL W (1.83) Multiple Sequence Alignments
> >
> > Sequence type explicitly set to Protein
> > Sequence format is Pearson
> > Sequence 1: AG876-BDLF4      225 aa
> > Sequence 2: B95-BDLF4        225 aa
> > Sequence 3: GD1-BDLF4        225 aa
> > Sequence 4: RLV-BDLF4        238 aa
> > Start of Pairwise alignments
> > Aligning...
> > Sequences (1:2) Aligned. Score:  100
> > Sequences (1:3) Aligned. Score:  98
> > Sequences (1:4) Aligned. Score:  85
> > Sequences (2:3) Aligned. Score:  98
> > Sequences (2:4) Aligned. Score:  85
> > Sequences (3:4) Aligned. Score:  86
> > Guide tree        file created:   [00029986C]
> > Start of Multiple Alignment
> > There are 3 groups
> > Aligning...
> > Group 1: Sequences:   2      Score:3770
> > Group 2: Sequences:   3      Score:3741
> > Group 3: Sequences:   4      Score:3462
> > Alignment Score 8058
> > GCG-Alignment file created      [00029986B]
> >
> > and now clustalw, unwrapped in emma, with the same input file
> >
> > yoda:cluscheck 158 > clustalw bdlf4.fa
> >
> >   CLUSTAL W (1.83) Multiple Sequence Alignments
> >
> > Sequence format is Pearson
> > Sequence 1: AG876-BDLF4      225 aa
> > Sequence 2: B95-BDLF4        225 aa
> > Sequence 3: GD1-BDLF4        225 aa
> > Sequence 4: RLV-BDLF4        238 aa
> > Start of Pairwise alignments
> > Aligning...
> > Sequences (1:2) Aligned. Score:  100
> > Sequences (1:3) Aligned. Score:  98
> > Sequences (1:4) Aligned. Score:  88
> > Sequences (2:3) Aligned. Score:  98
> > Sequences (2:4) Aligned. Score:  88
> > Sequences (3:4) Aligned. Score:  88
> > Guide tree        file created:   [bdlf4.dnd]
> > Start of Multiple Alignment
> > There are 3 groups
> > Aligning...
> > Group 1: Sequences:   2      Score:4959
> > Group 2: Sequences:   3      Score:4928
> > Group 3: Sequences:   4      Score:4677
> > Alignment Score 8187
> > CLUSTAL-Alignment file created  [bdlf4.aln]
> >
> > Why is the scoring subtly different?  and see what it does to the
> > N-terminal of the alignment....
> >
> > First with emma:
> >
> >             1                                               50
> > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > B95-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > GD1-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > RLV-BDLF4   MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP
> >
> > now with clustalw:
> >
> > AG876-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > B95-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > GD1-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > RLV-BDLF4
> > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
> >                   ***:**:*              * ***..  **.**********
> > *:*************
> >
> > Clustalw alone clearly gives the correct alignment whereas emma is
> > wrong.  I thought that emma simply wrapped clustalw for automation,
> > but it appears it is doing something else.  Out of a set of 80
> > proteins I am trying to pipeline through alignment, emma gives a
> > variant result for 7 of them.....
> >
> > Any thoughts, as always, much appreciated
> >
> > cheers
> > Derek
> > _______________________________________________
> > EMBOSS mailing list
> > EMBOSS at emboss.open-bio.org
> > http://newportal.open-bio.org/mailman/listinfo/emboss
> >