Antwort: matcher score calculation

David.Bauer at SCHERING.DE David.Bauer at SCHERING.DE
Thu Apr 24 14:07:07 UTC 2003



I had this problem long time ago (and assumed it was fixed in the
meantime).
Matcher doesn't like the "U". If you change your RNA to DNA it will
calculate the correct Similarity.

David.


Dear all,

I am trying to use 'matcher' to do a local alignment of a small RNA
sequence against a larger one. However, the output confuses me a bit.
For example:
matcher seq1 seq2 -alternatives 9 -stdout -auto > output

The best (first) match in the output is this:
########################################
# Program:  matcher
# Rundate:  Thu Apr 24 15:21:41 2003
# Align_format: markx0
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: 21
# 2: 21-1
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 18
# Identity:      16/18 (88.9%)
# Similarity:    13/18 (72.2%)
# Gaps:           0/18 ( 0.0%)
# Score: 61
#
#
#=======================================


             10        20
    21 GCAGCAUCAUCAAGAUUC
       :::::: :::.:::::::
  21-1 GCAGCACCAUUAAGAUUC
          440       450
#=======================================

Apparently 16 positions are identical (seems right, there are 16 ':') but
only 13 are counted as similar. First of all, I don't understand why CU
would be counted as similar (this score is after all negative in
EDNAFULL) and second, how can it be that #similar is small than
#identical. The manual
(http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats.html) states
that "Any two residues or bases are defined as similar when they have
positive comparisons (as defined by the comparison matrix being used in
the alignment algorithm)." and a bit further "Note that the sum of
identical and similar positions is greater than 100%. This is because the
count of similar positions includes the count of identical positions; if
residues are identical, they must also be similar." Therefor I would think
#similar must always be >= #identical.
Lastly, when I calculate the score manually, I get
16x5-2x4=72 (in EDNAFULL, 5 is used for all non-ambiguous matches, -4 for
all non-ambiguous mis-matches) while matcher calculates the score to be
61.

Any help on this would be greatly appreciated.
Greetings,

Jan.









More information about the EMBOSS mailing list