[EMBOSS] needle computes alignment with apparently suboptimal score -- bug?

Linlin Yan yanlinlin82 at gmail.com
Mon Dec 10 10:28:22 UTC 2018


I guess it is because that needle is optimized for global alignment
but not semi-global alignment.



On Fri, Dec 7, 2018 at 2:25 AM Jan T Kim <jttkim at googlemail.com> wrote:
>
> Hi All,
>
> I've coded up a pairwise semiglobal aligner and then checked it by
> comparing its output to that of needle. My expectation was to find
> problems in my code, but now, to my surprise, I found a case where
> the pairwise alignment computed by my code has a higher score than
> that returned by needle. The sequences are:
>
> >x1a
> CGTGATACATTACTTTTTA
> >x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
>
> Storing these in separate files x1a.fasta and x1b.fasta and running
>
>     needle -auto x1a.fasta x1b.fasta x1.txt
>
> gives this alignment with score 21.5:
>
> >x1a
> ------------CGTGAT--ACATTACTTTTTA--------------------
> >x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
>
> However, my aligner finds this with score 22.0:
>
> >x1a x1a, aligned semiglobally, score 22.000000
> ------------CGTGA-------TACA--TTACTTTTTA-----------------
> >x1b x1b, aligned semiglobally, score 22.000000
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTT----CGACCTGGCAGTGCAAG
>
> The Needleman-Wunsch algorithm computes an optimal alignment, i.e.
> one with maximal score, given the input sequences. Therefore, if my
> score computation is correct, the alignment reported by needle is
> not optimal, which would mean that there's a bug in needle.
>
> Of course it's also possible that I'm doing something wrong / stupid,
> so I append position by position breakdowns of my score calculations.
>
> I also attach the full needle output and my fasta files, so you can
> run the above command to check whether you get the same result. I've
> found this with Ubuntu xenial / emboss6.6.0+dfsg-3build1, and confirmed
> that I get the same with Ubuntu bionic / emboss 6.6.0+dfsg-6 (both amd64).
> The needle output file also confirms that needle computes the score of
> its alignment as 21.5, consistently with my scoring.
>
> I notice that it's a somewhat peculiar feature of my alignment that
> the internal gap of length 4 in x1b ends where the terminal (unpenalised)
> gap in x1a starts. I'm not aware of any criterion / rule violated by that,
> though.
>
> Best regards & thanks in advance for any scrutiny and comments, Jan
>
>
> ----- 8< --- breakdown of needle alignment scoring -------
>
>  +0.0  # -~G
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~T
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~G
>  +5.0  # C-C
>  +5.0  # G-G
>  +5.0  # T-T
>  -4.0  # G.C
>  +5.0  # A-A
>  +5.0  # T-T
> -10.0  # - G
>  -0.5  # - G
>  +5.0  # A-A
>  -4.0  # C.A
>  +5.0  # A-A
>  -4.0  # T.G
>  +5.0  # T-T
>  +5.0  # A-A
>  +5.0  # C-C
>  -4.0  # T.A
>  -4.0  # T.A
>  -4.0  # T.G
>  -4.0  # T.A
>  +5.0  # T-T
>  +5.0  # A-A
>  +0.0  # -~C
>  +0.0  # -~T
>  +0.0  # -~T
>  +0.0  # -~C
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~C
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~G
>  +0.0  # -~C
>  +0.0  # -~A
>  +0.0  # -~G
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~C
>  +0.0  # -~A
>  +0.0  # -~A
>  +0.0  # -~G
> +21.5
>
> ----- 8< --- breakdown of scoring of "my" alignment -------
>
>  +0.0  # -~G
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~T
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~G
>  +5.0  # C-C
>  +5.0  # G-G
>  +5.0  # T-T
>  -4.0  # G.C
>  +5.0  # A-A
> -10.0  # - T
>  -0.5  # - G
>  -0.5  # - G
>  -0.5  # - A
>  -0.5  # - A
>  -0.5  # - A
>  -0.5  # - G
>  +5.0  # T-T
>  +5.0  # A-A
>  +5.0  # C-C
>  +5.0  # A-A
> -10.0  # - A
>  -0.5  # - G
>  -4.0  # T.A
>  +5.0  # T-T
>  +5.0  # A-A
>  +5.0  # C-C
>  +5.0  # T-T
>  +5.0  # T-T
> -10.0  # T -
>  -0.5  # T -
>  -0.5  # T -
>  -0.5  # A -
>  +0.0  # -~C
>  +0.0  # -~G
>  +0.0  # -~A
>  +0.0  # -~C
>  +0.0  # -~C
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~G
>  +0.0  # -~C
>  +0.0  # -~A
>  +0.0  # -~G
>  +0.0  # -~T
>  +0.0  # -~G
>  +0.0  # -~C
>  +0.0  # -~A
>  +0.0  # -~A
>  +0.0  # -~G
> +22.0
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/emboss


More information about the EMBOSS mailing list