[EMBOSS] needle computes alignment with apparently suboptimal score -- bug?
Jan T Kim
jttkim at googlemail.com
Thu Dec 6 18:13:42 UTC 2018
Hi All,
I've coded up a pairwise semiglobal aligner and then checked it by
comparing its output to that of needle. My expectation was to find
problems in my code, but now, to my surprise, I found a case where
the pairwise alignment computed by my code has a higher score than
that returned by needle. The sequences are:
>x1a
CGTGATACATTACTTTTTA
>x1b
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
Storing these in separate files x1a.fasta and x1b.fasta and running
needle -auto x1a.fasta x1b.fasta x1.txt
gives this alignment with score 21.5:
>x1a
------------CGTGAT--ACATTACTTTTTA--------------------
>x1b
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
However, my aligner finds this with score 22.0:
>x1a x1a, aligned semiglobally, score 22.000000
------------CGTGA-------TACA--TTACTTTTTA-----------------
>x1b x1b, aligned semiglobally, score 22.000000
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTT----CGACCTGGCAGTGCAAG
The Needleman-Wunsch algorithm computes an optimal alignment, i.e.
one with maximal score, given the input sequences. Therefore, if my
score computation is correct, the alignment reported by needle is
not optimal, which would mean that there's a bug in needle.
Of course it's also possible that I'm doing something wrong / stupid,
so I append position by position breakdowns of my score calculations.
I also attach the full needle output and my fasta files, so you can
run the above command to check whether you get the same result. I've
found this with Ubuntu xenial / emboss6.6.0+dfsg-3build1, and confirmed
that I get the same with Ubuntu bionic / emboss 6.6.0+dfsg-6 (both amd64).
The needle output file also confirms that needle computes the score of
its alignment as 21.5, consistently with my scoring.
I notice that it's a somewhat peculiar feature of my alignment that
the internal gap of length 4 in x1b ends where the terminal (unpenalised)
gap in x1a starts. I'm not aware of any criterion / rule violated by that,
though.
Best regards & thanks in advance for any scrutiny and comments, Jan
----- 8< --- breakdown of needle alignment scoring -------
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~G
+5.0 # C-C
+5.0 # G-G
+5.0 # T-T
-4.0 # G.C
+5.0 # A-A
+5.0 # T-T
-10.0 # - G
-0.5 # - G
+5.0 # A-A
-4.0 # C.A
+5.0 # A-A
-4.0 # T.G
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
-4.0 # T.A
-4.0 # T.A
-4.0 # T.G
-4.0 # T.A
+5.0 # T-T
+5.0 # A-A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~C
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~C
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~A
+0.0 # -~G
+21.5
----- 8< --- breakdown of scoring of "my" alignment -------
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~T
+0.0 # -~T
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~G
+5.0 # C-C
+5.0 # G-G
+5.0 # T-T
-4.0 # G.C
+5.0 # A-A
-10.0 # - T
-0.5 # - G
-0.5 # - G
-0.5 # - A
-0.5 # - A
-0.5 # - A
-0.5 # - G
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
+5.0 # A-A
-10.0 # - A
-0.5 # - G
-4.0 # T.A
+5.0 # T-T
+5.0 # A-A
+5.0 # C-C
+5.0 # T-T
+5.0 # T-T
-10.0 # T -
-0.5 # T -
-0.5 # T -
-0.5 # A -
+0.0 # -~C
+0.0 # -~G
+0.0 # -~A
+0.0 # -~C
+0.0 # -~C
+0.0 # -~T
+0.0 # -~G
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~G
+0.0 # -~T
+0.0 # -~G
+0.0 # -~C
+0.0 # -~A
+0.0 # -~A
+0.0 # -~G
+22.0
-------------- next part --------------
>x1a
CGTGATACATTACTTTTTA
-------------- next part --------------
>x1b
GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
-------------- next part --------------
########################################
# Program: needle
# Rundate: Thu 6 Dec 2018 17:24:28
# Commandline: needle
# -auto
# [-asequence] x1a.fasta
# [-bsequence] x1b.fasta
# [-outfile] x1.txt
# Align_format: srspair
# Report_file: x1.txt
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: x1a
# 2: x1b
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 53
# Identity: 12/53 (22.6%)
# Similarity: 12/53 (22.6%)
# Gaps: 34/53 (64.2%)
# Score: 21.5
#
#
#=======================================
x1a 1 ------------CGTGAT--ACATTACTTTTTA----------------- 19
|||.|| |.|.|||....||
x1b 1 GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGC 50
x1a 19 --- 19
x1b 51 AAG 53
#---------------------------------------
#---------------------------------------
More information about the EMBOSS
mailing list