matcher score calculation

Jan Wuyts jan.wuyts at gengenp.rug.ac.be
Thu Apr 24 13:48:16 UTC 2003


Dear all,

I am trying to use 'matcher' to do a local alignment of a small RNA
sequence against a larger one. However, the output confuses me a bit.
For example:
matcher seq1 seq2 -alternatives 9 -stdout -auto > output

The best (first) match in the output is this:
########################################
# Program:  matcher
# Rundate:  Thu Apr 24 15:21:41 2003
# Align_format: markx0
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: 21
# 2: 21-1
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 18
# Identity:      16/18 (88.9%)
# Similarity:    13/18 (72.2%)
# Gaps:           0/18 ( 0.0%)
# Score: 61
# 
#
#=======================================


             10        20
    21 GCAGCAUCAUCAAGAUUC
       :::::: :::.:::::::
  21-1 GCAGCACCAUUAAGAUUC
          440       450  
#======================================= 

Apparently 16 positions are identical (seems right, there are 16 ':') but
only 13 are counted as similar. First of all, I don't understand why CU
would be counted as similar (this score is after all negative in
EDNAFULL) and second, how can it be that #similar is small than
#identical. The manual
(http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats.html) states
that "Any two residues or bases are defined as similar when they have
positive comparisons (as defined by the comparison matrix being used in
the alignment algorithm)." and a bit further "Note that the sum of
identical and similar positions is greater than 100%. This is because the
count of similar positions includes the count of identical positions; if
residues are identical, they must also be similar." Therefor I would think
#similar must always be >= #identical.
Lastly, when I calculate the score manually, I get
16x5-2x4=72 (in EDNAFULL, 5 is used for all non-ambiguous matches, -4 for
all non-ambiguous mis-matches) while matcher calculates the score to be
61. 

Any help on this would be greatly appreciated.
Greetings,

Jan.




More information about the EMBOSS mailing list