Antwort: matcher score calculation

Thu Apr 24 19:10:52 UTC 2003

What is even more surprising is that the match U->C is different 
from C->U. The first receives no 'dot' in the alignment, the latter
does. Interesting...

Jack A.M. Leunissen, Ph.D.
Dept. Genome Informatics
Wageningen University
6703 HA Wageningen, NL

> -----Original Message-----
> From: owner-emboss at hgmp.mrc.ac.uk 
> [mailto:owner-emboss at hgmp.mrc.ac.uk] On Behalf Of 
> David.Bauer at SCHERING.DE
> Sent: Thursday, 24 April, 2003 16:07
> To: jan.wuyts at gengenp.rug.ac.be
> Cc: emboss at embnet.org; jan.wuyts at gengenp.rug.ac.be; 
> owner-emboss at hgmp.mrc.ac.uk
> Subject: Antwort: matcher score calculation
> 
> 
> 
> 
> I had this problem long time ago (and assumed it was fixed in 
> the meantime). Matcher doesn't like the "U". If you change 
> your RNA to DNA it will calculate the correct Similarity.
> 
> David.
> 
> 
> Dear all,
> 
> I am trying to use 'matcher' to do a local alignment of a 
> small RNA sequence against a larger one. However, the output 
> confuses me a bit. For example: matcher seq1 seq2 
> -alternatives 9 -stdout -auto > output
> 
> The best (first) match in the output is this: 
> ########################################
> # Program:  matcher
> # Rundate:  Thu Apr 24 15:21:41 2003
> # Align_format: markx0
> # Report_file: stdout
> ########################################
> #=======================================
> #
> # Aligned_sequences: 2
> # 1: 21
> # 2: 21-1
> # Matrix: EDNAFULL
> # Gap_penalty: 16
> # Extend_penalty: 4
> #
> # Length: 18
> # Identity:      16/18 (88.9%)
> # Similarity:    13/18 (72.2%)
> # Gaps:           0/18 ( 0.0%)
> # Score: 61
> #
> #
> #=======================================
> 
> 
>              10        20
>     21 GCAGCAUCAUCAAGAUUC
>        :::::: :::.:::::::
>   21-1 GCAGCACCAUUAAGAUUC
>           440       450
> #=======================================
> 
> Apparently 16 positions are identical (seems right, there are 
> 16 ':') but only 13 are counted as similar. First of all, I 
> don't understand why CU would be counted as similar (this 
> score is after all negative in
> EDNAFULL) and second, how can it be that #similar is small 
> than #identical. The manual
> (http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats
.html) states that "Any two residues or bases are defined as similar when
they have positive comparisons (as defined by the comparison matrix being
used in the alignment algorithm)." and a bit further "Note that the sum of
identical and similar positions is greater than 100%. This is because the
count of similar positions includes the count of identical positions; if
residues are identical, they must also be similar." Therefor I would think
#similar must always be >= #identical. Lastly, when I calculate the score
manually, I get 16x5-2x4=72 (in EDNAFULL, 5 is used for all non-ambiguous
matches, -4 for all non-ambiguous mis-matches) while matcher calculates the
score to be 61.

Any help on this would be greatly appreciated.
Greetings,

Jan.