[EMBOSS] fuzznuc repetition issue

Bernd W bernd.web at gmail.com
Tue Apr 9 15:12:45 UTC 2013


Hi,

I tried a repetition with fuzznuc in a pattern. It seems that when I start
a range with 0 e.g. (0,1), the pattern is not found when it is located at
the end of the sequence and the count of the character is 0. This only
occurs when there are more matches possible.

The following example shows this. It contains the pattern, once with 4
mismatches.

>test
ACTACTACATACATACACATATACACATGAGGTTTTAGGGGATGACGTAAGGGGGNNNNNGAGGAAGGAGGGGATGACGT

fuzznuc -pmismatch 4 -sequence test.fa -outfile test.fuzznuc -pattern
'GAGGAAGGAGGGGATGACGT'


results in the expected output:
  Start     End  Strand Pattern                      Mismatch Sequence
     29      48       + pattern:GAGGAAGGAGGGGATGACGT        4
GAGGTTTTAGGGGATGACGT
     61      80       + pattern:GAGGAAGGAGGGGATGACGT        .
GAGGAAGGAGGGGATGACGT

However, fuzznuc -pmismatch 4 -sequence test.fa -outfile test.fuzznuc
-pattern 'GAGGAAGGAGGGGATGACGTn(0,3)'
only find the first pattern at pos 29, with 0,1,2 and 3 times a match any
nucleotide (so 4 matches in total), but not the one at 61-80.

Now, if I request 0 mismatches (-pmismatch 0), then this last pattern is
reported (from 61 to 80). When requesting e.g. 3 mismatches no hit is
found. The first has 4 mismatches, but now also also last with 0 mismatches
is not reported. This only seems to be reported when I ask for 0
mismatches.

However, when allowing 4 mismatches I'd expect 5 hits in total (4 starting
at 29 with 4 mismatches) and one starting at 61.
This occured in EMBOSS 6.3.1 and 6.5.7.


Is this a wrong expectation, or is something not going entirely right?


Kind regards,
Bernd



More information about the EMBOSS mailing list