matcher score calculation
Jan Wuyts
jan.wuyts at gengenp.rug.ac.be
Thu Apr 24 13:48:16 UTC 2003
Dear all,
I am trying to use 'matcher' to do a local alignment of a small RNA
sequence against a larger one. However, the output confuses me a bit.
For example:
matcher seq1 seq2 -alternatives 9 -stdout -auto > output
The best (first) match in the output is this:
########################################
# Program: matcher
# Rundate: Thu Apr 24 15:21:41 2003
# Align_format: markx0
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: 21
# 2: 21-1
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 18
# Identity: 16/18 (88.9%)
# Similarity: 13/18 (72.2%)
# Gaps: 0/18 ( 0.0%)
# Score: 61
#
#
#=======================================
10 20
21 GCAGCAUCAUCAAGAUUC
:::::: :::.:::::::
21-1 GCAGCACCAUUAAGAUUC
440 450
#=======================================
Apparently 16 positions are identical (seems right, there are 16 ':') but
only 13 are counted as similar. First of all, I don't understand why CU
would be counted as similar (this score is after all negative in
EDNAFULL) and second, how can it be that #similar is small than
#identical. The manual
(http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/AlignFormats.html) states
that "Any two residues or bases are defined as similar when they have
positive comparisons (as defined by the comparison matrix being used in
the alignment algorithm)." and a bit further "Note that the sum of
identical and similar positions is greater than 100%. This is because the
count of similar positions includes the count of identical positions; if
residues are identical, they must also be similar." Therefor I would think
#similar must always be >= #identical.
Lastly, when I calculate the score manually, I get
16x5-2x4=72 (in EDNAFULL, 5 is used for all non-ambiguous matches, -4 for
all non-ambiguous mis-matches) while matcher calculates the score to be
61.
Any help on this would be greatly appreciated.
Greetings,
Jan.
More information about the EMBOSS
mailing list