[EMBOSS] clustalw vs. emma
ajb at ebi.ac.uk
ajb at ebi.ac.uk
Thu Mar 9 15:58:33 UTC 2006
Hi Derek,
emma is indeed just a wrapper for clustalw. You can see what default
parameters it is using by specifying -debug on the command line
and then looking at the emma.dbg file. Search for a line
saying "Executing 'clustalw"
I suspect that the default gap extension penalty is rather high
in your case. If you use (e.g.) -gapext 0.2 then you'll get
something approaching the default clustalw behaviour. The defaults
for your sequences seem to be:
-gapopen=10.000 -gapext=5.000 -gapdist=8
HTH
Alan
> Morning all
>
> Is there some unusual default being passed to emma? For instance,
> here's emma with a vanilla set of parameters on a fairly well
> conserved set of proteins (bdlf4.fa):
>
> yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto
>
> CLUSTAL W (1.83) Multiple Sequence Alignments
>
> Sequence type explicitly set to Protein
> Sequence format is Pearson
> Sequence 1: AG876-BDLF4 225 aa
> Sequence 2: B95-BDLF4 225 aa
> Sequence 3: GD1-BDLF4 225 aa
> Sequence 4: RLV-BDLF4 238 aa
> Start of Pairwise alignments
> Aligning...
> Sequences (1:2) Aligned. Score: 100
> Sequences (1:3) Aligned. Score: 98
> Sequences (1:4) Aligned. Score: 85
> Sequences (2:3) Aligned. Score: 98
> Sequences (2:4) Aligned. Score: 85
> Sequences (3:4) Aligned. Score: 86
> Guide tree file created: [00029986C]
> Start of Multiple Alignment
> There are 3 groups
> Aligning...
> Group 1: Sequences: 2 Score:3770
> Group 2: Sequences: 3 Score:3741
> Group 3: Sequences: 4 Score:3462
> Alignment Score 8058
> GCG-Alignment file created [00029986B]
>
> and now clustalw, unwrapped in emma, with the same input file
>
> yoda:cluscheck 158 > clustalw bdlf4.fa
>
> CLUSTAL W (1.83) Multiple Sequence Alignments
>
> Sequence format is Pearson
> Sequence 1: AG876-BDLF4 225 aa
> Sequence 2: B95-BDLF4 225 aa
> Sequence 3: GD1-BDLF4 225 aa
> Sequence 4: RLV-BDLF4 238 aa
> Start of Pairwise alignments
> Aligning...
> Sequences (1:2) Aligned. Score: 100
> Sequences (1:3) Aligned. Score: 98
> Sequences (1:4) Aligned. Score: 88
> Sequences (2:3) Aligned. Score: 98
> Sequences (2:4) Aligned. Score: 88
> Sequences (3:4) Aligned. Score: 88
> Guide tree file created: [bdlf4.dnd]
> Start of Multiple Alignment
> There are 3 groups
> Aligning...
> Group 1: Sequences: 2 Score:4959
> Group 2: Sequences: 3 Score:4928
> Group 3: Sequences: 4 Score:4677
> Alignment Score 8187
> CLUSTAL-Alignment file created [bdlf4.aln]
>
> Why is the scoring subtly different? and see what it does to the
> N-terminal of the alignment....
>
> First with emma:
>
> 1 50
> AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP
>
> now with clustalw:
>
> AG876-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> B95-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> GD1-BDLF4
> MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> RLV-BDLF4
> MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
> ***:**:* * ***.. **.**********
> *:*************
>
> Clustalw alone clearly gives the correct alignment whereas emma is
> wrong. I thought that emma simply wrapped clustalw for automation,
> but it appears it is doing something else. Out of a set of 80
> proteins I am trying to pipeline through alignment, emma gives a
> variant result for 7 of them.....
>
> Any thoughts, as always, much appreciated
>
> cheers
> Derek
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at emboss.open-bio.org
> http://newportal.open-bio.org/mailman/listinfo/emboss
>
More information about the EMBOSS
mailing list