[EMBOSS] non-overlapping matches in fuzznuc?

Peter Rice pmr at ebi.ac.uk
Thu Oct 13 00:02:08 UTC 2011


On 12/10/2011 16:50, Aengus Stewart wrote:
> Hi Folks,
>
> I couldnt see a command line option to do what I wanted ie return
> non-overlapping hits.
>
> This is best explained with some sample output.
>
> #=======================================
> #
> # Sequence: chr1_174353258_174354335 from: 1 to: 200
> # HitCount: 9
> #
> # Pattern_name Mismatch Pattern
> # pattern1 3 CC[AT](6)GG
>
> As you can see this is actually only 4 hits rather than the 9 reported.

Hmmm ... with that kind of pattern and 3 mismatches there are pretty 
sure to be overlapping matches.

Trouble is, which matches would you want to keep? Your second match, for 
example, has 2 hits with 1 mismatch at 104..115 and 105..116

It should be possible to come up with patterns where the choice of 'best 
hit' complicates which hits are considered to overlap.

Probably writing a script is your best bet as you can then control which 
hits are picked.

We could try to write an application to remove overlapping features ... 
if someone can define how to select them. In this case, the mismatch 
number will be stored as a tag (feature qualifier) in the feature table 
and could be included in the selection criteria.

Hope this helps ... and maybe sparks some ideas

Peter Rice
EMBOSS Team



More information about the EMBOSS mailing list