[EMBOSS] clustalw vs. emma
Derek Gatherer
d.gatherer at vir.gla.ac.uk
Thu Mar 9 16:18:55 UTC 2006
Thanks Alan
That indeed is the cause of the problem:
Executing 'clustalw -infile=00052348A -outfile=00052348B -align
-type=protein -o
utput=gcg -pwmatrix=blosum -pwgapopen=10.000 -pwgapext=0.100
-newtree=00052348C
-matrix=blosum -gapopen=10.000 -gapext=5.000 -gapdist=8
-hgapresidues=GPSNDQEKR
-maxdiv=30'
However, on attempting to manually specify it, I run into another one:
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2
bdlf4.emma -auto -debug -pwgapextend 5
Died: Unknown qualifier -pwgapextend
In the docs http://emboss.sourceforge.net/apps/cvs/emma.html, there
are quite a few optional parameters of this sort, some of which work
and others don't, eg:
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2
bdlf4.emma -auto -debug -gapextend 5
Died: Unknown qualifier -gapextend
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2
bdlf4.emma -auto -debug -pwgapextend 5
Died: Unknown qualifier -pwgapextend
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2
bdlf4.emma -auto -debug -gapopen 5
Died: Unknown qualifier -gapopen
[gath01d at gamma cluscheck]$ emma bdlf4.fa -osformat2 msf -out2
bdlf4.emma -auto -debug -gapdist 5
CLUSTAL W (1.83) Multiple Sequence Alignments
so -gapdist works at least.
Cheers
Derek
At 15:58 09/03/2006, ajb at ebi.ac.uk wrote:
>Hi Derek,
>
>emma is indeed just a wrapper for clustalw. You can see what default
>parameters it is using by specifying -debug on the command line
>and then looking at the emma.dbg file. Search for a line
>saying "Executing 'clustalw"
>
>I suspect that the default gap extension penalty is rather high
>in your case. If you use (e.g.) -gapext 0.2 then you'll get
>something approaching the default clustalw behaviour. The defaults
>for your sequences seem to be:
>
> -gapopen=10.000 -gapext=5.000 -gapdist=8
>
>
>HTH
>
>Alan
>
> > Morning all
> >
> > Is there some unusual default being passed to emma? For instance,
> > here's emma with a vanilla set of parameters on a fairly well
> > conserved set of proteins (bdlf4.fa):
> >
> > yoda:cluscheck 157 > emma bdlf4.fa -osformat2 msf -out2 bdlf4.emma -auto
> >
> > CLUSTAL W (1.83) Multiple Sequence Alignments
> >
> > Sequence type explicitly set to Protein
> > Sequence format is Pearson
> > Sequence 1: AG876-BDLF4 225 aa
> > Sequence 2: B95-BDLF4 225 aa
> > Sequence 3: GD1-BDLF4 225 aa
> > Sequence 4: RLV-BDLF4 238 aa
> > Start of Pairwise alignments
> > Aligning...
> > Sequences (1:2) Aligned. Score: 100
> > Sequences (1:3) Aligned. Score: 98
> > Sequences (1:4) Aligned. Score: 85
> > Sequences (2:3) Aligned. Score: 98
> > Sequences (2:4) Aligned. Score: 85
> > Sequences (3:4) Aligned. Score: 86
> > Guide tree file created: [00029986C]
> > Start of Multiple Alignment
> > There are 3 groups
> > Aligning...
> > Group 1: Sequences: 2 Score:3770
> > Group 2: Sequences: 3 Score:3741
> > Group 3: Sequences: 4 Score:3462
> > Alignment Score 8058
> > GCG-Alignment file created [00029986B]
> >
> > and now clustalw, unwrapped in emma, with the same input file
> >
> > yoda:cluscheck 158 > clustalw bdlf4.fa
> >
> > CLUSTAL W (1.83) Multiple Sequence Alignments
> >
> > Sequence format is Pearson
> > Sequence 1: AG876-BDLF4 225 aa
> > Sequence 2: B95-BDLF4 225 aa
> > Sequence 3: GD1-BDLF4 225 aa
> > Sequence 4: RLV-BDLF4 238 aa
> > Start of Pairwise alignments
> > Aligning...
> > Sequences (1:2) Aligned. Score: 100
> > Sequences (1:3) Aligned. Score: 98
> > Sequences (1:4) Aligned. Score: 88
> > Sequences (2:3) Aligned. Score: 98
> > Sequences (2:4) Aligned. Score: 88
> > Sequences (3:4) Aligned. Score: 88
> > Guide tree file created: [bdlf4.dnd]
> > Start of Multiple Alignment
> > There are 3 groups
> > Aligning...
> > Group 1: Sequences: 2 Score:4959
> > Group 2: Sequences: 3 Score:4928
> > Group 3: Sequences: 4 Score:4677
> > Alignment Score 8187
> > CLUSTAL-Alignment file created [bdlf4.aln]
> >
> > Why is the scoring subtly different? and see what it does to the
> > N-terminal of the alignment....
> >
> > First with emma:
> >
> > 1 50
> > AG876-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > B95-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > GD1-BDLF4 ~~~~~~~~~~~~~MSDQGRLSLPRGEGGTDEPNPRHLCSYSKLEFHLPLP
> > RLV-BDLF4 MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLP
> >
> > now with clustalw:
> >
> > AG876-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > B95-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > GD1-BDLF4
> > MSDQGRLS-------------LPRGEGGTDEPNPRHLCSYSKLEFHLPLPESMASVFACW
> > RLV-BDLF4
> > MSDHGRVSGRPRGAVRGRGASSPDGEGAPTGPNSRHLCSYSKLESHFPLPESMASVFACW
> > ***:**:* * ***.. **.**********
> > *:*************
> >
> > Clustalw alone clearly gives the correct alignment whereas emma is
> > wrong. I thought that emma simply wrapped clustalw for automation,
> > but it appears it is doing something else. Out of a set of 80
> > proteins I am trying to pipeline through alignment, emma gives a
> > variant result for 7 of them.....
> >
> > Any thoughts, as always, much appreciated
> >
> > cheers
> > Derek
> > _______________________________________________
> > EMBOSS mailing list
> > EMBOSS at emboss.open-bio.org
> > http://newportal.open-bio.org/mailman/listinfo/emboss
> >
More information about the EMBOSS
mailing list