[EMBOSS] needle computes alignment with apparently suboptimal score -- bug?

Peter Rice ricepeterm at yahoo.co.uk
Mon Dec 10 15:17:17 UTC 2018


Hi Jan,

That is a bug in needle.

What happens is that just before the end of your alignment there are two 
ways to get a high score. It picks a match, and fails to notice that 
extending an earlier gap is cheaper than starting a new gap.

That means one of our optimisations is causing a problem and will need 
to allow a wider search for scores when considering starting a new gap.

Many thanks for finding it.

Peter Rice
EMBOSS Team



On 06/12/2018 18:13, Jan T Kim wrote:
> Hi All,
> 
> I've coded up a pairwise semiglobal aligner and then checked it by
> comparing its output to that of needle. My expectation was to find
> problems in my code, but now, to my surprise, I found a case where
> the pairwise alignment computed by my code has a higher score than
> that returned by needle. The sequences are:
> 
>> x1a
> CGTGATACATTACTTTTTA
>> x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
> 
> Storing these in separate files x1a.fasta and x1b.fasta and running
> 
>      needle -auto x1a.fasta x1b.fasta x1.txt
> 
> gives this alignment with score 21.5:
> 
>> x1a
> ------------CGTGAT--ACATTACTTTTTA--------------------
>> x1b
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTTCGACCTGGCAGTGCAAG
> 
> However, my aligner finds this with score 22.0:
> 
>> x1a x1a, aligned semiglobally, score 22.000000
> ------------CGTGA-------TACA--TTACTTTTTA-----------------
>> x1b x1b, aligned semiglobally, score 22.000000
> GTGGACTTGACGCGTCATGGAAAGTACAAGATACTT----CGACCTGGCAGTGCAAG
> 
> The Needleman-Wunsch algorithm computes an optimal alignment, i.e.
> one with maximal score, given the input sequences. Therefore, if my
> score computation is correct, the alignment reported by needle is
> not optimal, which would mean that there's a bug in needle.
> 
> Of course it's also possible that I'm doing something wrong / stupid,
> so I append position by position breakdowns of my score calculations.
> 
> I also attach the full needle output and my fasta files, so you can
> run the above command to check whether you get the same result. I've
> found this with Ubuntu xenial / emboss6.6.0+dfsg-3build1, and confirmed
> that I get the same with Ubuntu bionic / emboss 6.6.0+dfsg-6 (both amd64).
> The needle output file also confirms that needle computes the score of
> its alignment as 21.5, consistently with my scoring.
> 
> I notice that it's a somewhat peculiar feature of my alignment that
> the internal gap of length 4 in x1b ends where the terminal (unpenalised)
> gap in x1a starts. I'm not aware of any criterion / rule violated by that,
> though.
> 
> Best regards & thanks in advance for any scrutiny and comments, Jan
> 
> 
> ----- 8< --- breakdown of needle alignment scoring -------
> 
>   +0.0  # -~G
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~T
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~G
>   +5.0  # C-C
>   +5.0  # G-G
>   +5.0  # T-T
>   -4.0  # G.C
>   +5.0  # A-A
>   +5.0  # T-T
> -10.0  # - G
>   -0.5  # - G
>   +5.0  # A-A
>   -4.0  # C.A
>   +5.0  # A-A
>   -4.0  # T.G
>   +5.0  # T-T
>   +5.0  # A-A
>   +5.0  # C-C
>   -4.0  # T.A
>   -4.0  # T.A
>   -4.0  # T.G
>   -4.0  # T.A
>   +5.0  # T-T
>   +5.0  # A-A
>   +0.0  # -~C
>   +0.0  # -~T
>   +0.0  # -~T
>   +0.0  # -~C
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~C
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~G
>   +0.0  # -~C
>   +0.0  # -~A
>   +0.0  # -~G
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~C
>   +0.0  # -~A
>   +0.0  # -~A
>   +0.0  # -~G
> +21.5
> 
> ----- 8< --- breakdown of scoring of "my" alignment -------
> 
>   +0.0  # -~G
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~T
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~G
>   +5.0  # C-C
>   +5.0  # G-G
>   +5.0  # T-T
>   -4.0  # G.C
>   +5.0  # A-A
> -10.0  # - T
>   -0.5  # - G
>   -0.5  # - G
>   -0.5  # - A
>   -0.5  # - A
>   -0.5  # - A
>   -0.5  # - G
>   +5.0  # T-T
>   +5.0  # A-A
>   +5.0  # C-C
>   +5.0  # A-A
> -10.0  # - A
>   -0.5  # - G
>   -4.0  # T.A
>   +5.0  # T-T
>   +5.0  # A-A
>   +5.0  # C-C
>   +5.0  # T-T
>   +5.0  # T-T
> -10.0  # T -
>   -0.5  # T -
>   -0.5  # T -
>   -0.5  # A -
>   +0.0  # -~C
>   +0.0  # -~G
>   +0.0  # -~A
>   +0.0  # -~C
>   +0.0  # -~C
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~G
>   +0.0  # -~C
>   +0.0  # -~A
>   +0.0  # -~G
>   +0.0  # -~T
>   +0.0  # -~G
>   +0.0  # -~C
>   +0.0  # -~A
>   +0.0  # -~A
>   +0.0  # -~G
> +22.0
> 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/emboss
> 

---
This email has been checked for viruses by AVG.
https://www.avg.com



More information about the EMBOSS mailing list