[EMBOSS] clustalw vs. emma

ajb at ebi.ac.uk ajb at ebi.ac.uk
Thu Mar 9 15:58:33 UTC 2006


Hi Derek,

emma is indeed just a wrapper for clustalw. You can see what default
parameters it is using by specifying -debug on the command line
and then looking at the emma.dbg file. Search for a line
saying "Executing 'clustalw"

I suspect that the default gap extension penalty is rather high
in your case. If you use (e.g.) -gapext 0.2   then you'll get
something approaching the default clustalw behaviour. The defaults
for your sequences seem to be:

  -gapopen=10.000 -gapext=5.000 -gapdist=8


HTH

Alan

> Morning all
>
> Is there some unusual default being passed to emma?  For instance,
> here's emma with a vanilla set of parameters on a fairly well
> conserved set of proteins (bdlf4.fa):
>
> yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto
>
>   CLUSTAL W (1.83) Multiple Sequence Alignments
>
> Sequence type explicitly set to Protein
> Sequence format is Pearson
> Sequence 1: AG876-BDLF4      225 aa
> Sequence 2: B95-BDLF4        225 aa
> Sequence 3: GD1-BDLF4        225 aa
> Sequence 4: RLV-BDLF4        238 aa
> Start of Pairwise alignments
> Aligning...
> Sequences (1:2) Aligned. Score:  100
> Sequences (1:3) Aligned. Score:  98
> Sequences (1:4) Aligned. Score:  85
> Sequences (2:3) Aligned. Score:  98
> Sequences (2:4) Aligned. Score:  85
> Sequences (3:4) Aligned. Score:  86
> Guide tree        file created:   [00029986C]
> Start of Multiple Alignment
> There are 3 groups
> Aligning...
> Group 1: Sequences:   2      Score:3770
> Group 2: Sequences:   3      Score:3741
> Group 3: Sequences:   4      Score:3462
> Alignment Score 8058
> GCG-Alignment file created      [00029986B]
>
> and now clustalw, unwrapped in emma, with the same input file
>
> yoda:cluscheck 158 > clustalw bdlf4.fa
>
>   CLUSTAL W (1.83) Multiple Sequence Alignments
>
> Sequence format is Pearson
> Sequence 1: AG876-BDLF4      225 aa
> Sequence 2: B95-BDLF4        225 aa
> Sequence 3: GD1-BDLF4        225 aa
> Sequence 4: RLV-BDLF4        238 aa
> Start of Pairwise alignments
> Aligning...
> Sequences (1:2) Aligned. Score:  100
> Sequences (1:3) Aligned. Score:  98
> Sequences (1:4) Aligned. Score:  88
> Sequences (2:3) Aligned. Score:  98
> Sequences (2:4) Aligned. Score:  88
> Sequences (3:4) Aligned. Score:  88
> Guide tree        file created:   [bdlf4.dnd]
> Start of Multiple Alignment
> There are 3 groups
> Aligning...
> Group 1: Sequences:   2      Score:4959
> Group 2: Sequences:   3      Score:4928
> Group 3: Sequences:   4      Score:4677
> Alignment Score 8187
> CLUSTAL-Alignment file created  [bdlf4.aln]
>
> Why is the scoring subtly different?  and see what it does to the
> N-terminal of the alignment....
>
> First with emma:
>
>             1                                               50
> AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> B95-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> GD1-BDLF4   ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> RLV-BDLF4   MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP
>
> now with clustalw:
>
> AG876-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> B95-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> GD1-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> RLV-BDLF4
> MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
>                   ***:**:*              * ***..  **.**********
> *:*************
>
> Clustalw alone clearly gives the correct alignment whereas emma is
> wrong.  I thought that emma simply wrapped clustalw for automation,
> but it appears it is doing something else.  Out of a set of 80
> proteins I am trying to pipeline through alignment, emma gives a
> variant result for 7 of them.....
>
> Any thoughts, as always, much appreciated
>
> cheers
> Derek
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss
>





More information about the EMBOSS mailing list