[EMBOSS] needle computes alignment with apparently suboptimal score -- bug?
Peter Rice
ricepeterm at yahoo.co.uk
Mon Dec 10 15:17:17 UTC 2018
Hi Jan,
That is a bug in needle.
What happens is that just before the end of your alignment there are two
ways to get a high score. It picks a match, and fails to notice that
extending an earlier gap is cheaper than starting a new gap.
That means one of our optimisations is causing a problem and will need
to allow a wider search for scores when considering starting a new gap.
Many thanks for finding it.
Peter Rice
EMBOSS Team
On 06/12/2018 18:13, Jan T Kim wrote:
> Hi All,
>
> I've coded up a pairwise semiglobal aligner and then checked it by
> comparing its output to that of needle. My expectation was to find
> problems in my code, but now, to my surprise, I found a case where
> the pairwise alignment computed by my code has a higher score than
> that returned by needle. The sequences are:
>
>> x1a
> CGTGATACATTACTTTTTA
>> x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
>
> Storing these in separate files x1a.fasta and x1b.fasta and running
>
> needle -auto x1a.fasta x1b.fasta x1.txt
>
> gives this alignment with score 21.5:
>
>> x1a
> ------------CGTGAT--ACATTACTTTTTA--------------------
>> x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
>
> However, my aligner finds this with score 22.0:
>
>> x1a x1a, aligned semiglobally, score 22.000000
> ------------CGTGA-------TACA--TTACTTTTTA-----------------
>> x1b x1b, aligned semiglobally, score 22.000000
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTT----CGACCTGGCAGTGCAAG
>
> The Needleman-Wunsch algorithm computes an optimal alignment, i.e.
> one with maximal score, given the input sequences. Therefore, if my
> score computation is correct, the alignment reported by needle is
> not optimal, which would mean that there's a bug in needle.
>
> Of course it's also possible that I'm doing something wrong / stupid,
> so I append position by position breakdowns of my score calculations.
>
> I also attach the full needle output and my fasta files, so you can
> run the above command to check whether you get the same result. I've
> found this with Ubuntu xenial / emboss6.6.0+dfsg-3build1, and confirmed
> that I get the same with Ubuntu bionic / emboss 6.6.0+dfsg-6 (both amd64).
> The needle output file also confirms that needle computes the score of
> its alignment as 21.5, consistently with my scoring.
>
> I notice that it's a somewhat peculiar feature of my alignment that
> the internal gap of length 4 in x1b ends where the terminal (unpenalised)
> gap in x1a starts. I'm not aware of any criterion / rule violated by that,
> though.
>
> Best regards & thanks in advance for any scrutiny and comments, Jan
>
>
> ----- 8< --- breakdown of needle alignment scoring -------
>
> +0.0 # -~G
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~T
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~G
> +5.0 # C-C
> +5.0 # G-G
> +5.0 # T-T
> -4.0 # G.C
> +5.0 # A-A
> +5.0 # T-T
> -10.0 # - G
> -0.5 # - G
> +5.0 # A-A
> -4.0 # C.A
> +5.0 # A-A
> -4.0 # T.G
> +5.0 # T-T
> +5.0 # A-A
> +5.0 # C-C
> -4.0 # T.A
> -4.0 # T.A
> -4.0 # T.G
> -4.0 # T.A
> +5.0 # T-T
> +5.0 # A-A
> +0.0 # -~C
> +0.0 # -~T
> +0.0 # -~T
> +0.0 # -~C
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~C
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~G
> +0.0 # -~C
> +0.0 # -~A
> +0.0 # -~G
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~C
> +0.0 # -~A
> +0.0 # -~A
> +0.0 # -~G
> +21.5
>
> ----- 8< --- breakdown of scoring of "my" alignment -------
>
> +0.0 # -~G
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~T
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~G
> +5.0 # C-C
> +5.0 # G-G
> +5.0 # T-T
> -4.0 # G.C
> +5.0 # A-A
> -10.0 # - T
> -0.5 # - G
> -0.5 # - G
> -0.5 # - A
> -0.5 # - A
> -0.5 # - A
> -0.5 # - G
> +5.0 # T-T
> +5.0 # A-A
> +5.0 # C-C
> +5.0 # A-A
> -10.0 # - A
> -0.5 # - G
> -4.0 # T.A
> +5.0 # T-T
> +5.0 # A-A
> +5.0 # C-C
> +5.0 # T-T
> +5.0 # T-T
> -10.0 # T -
> -0.5 # T -
> -0.5 # T -
> -0.5 # A -
> +0.0 # -~C
> +0.0 # -~G
> +0.0 # -~A
> +0.0 # -~C
> +0.0 # -~C
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~G
> +0.0 # -~C
> +0.0 # -~A
> +0.0 # -~G
> +0.0 # -~T
> +0.0 # -~G
> +0.0 # -~C
> +0.0 # -~A
> +0.0 # -~A
> +0.0 # -~G
> +22.0
>
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/emboss
>
---
This email has been checked for viruses by AVG.
https://www.avg.com
More information about the EMBOSS
mailing list