[EMBOSS] non-overlapping matches in fuzznuc?
Peter Rice
pmr at ebi.ac.uk
Thu Oct 13 00:02:08 UTC 2011
On 12/10/2011 16:50, Aengus Stewart wrote:
> Hi Folks,
>
> I couldnt see a command line option to do what I wanted ie return
> non-overlapping hits.
>
> This is best explained with some sample output.
>
> #=======================================
> #
> # Sequence: chr1_174353258_174354335 from: 1 to: 200
> # HitCount: 9
> #
> # Pattern_name Mismatch Pattern
> # pattern1 3 CC[AT](6)GG
>
> As you can see this is actually only 4 hits rather than the 9 reported.
Hmmm ... with that kind of pattern and 3 mismatches there are pretty
sure to be overlapping matches.
Trouble is, which matches would you want to keep? Your second match, for
example, has 2 hits with 1 mismatch at 104..115 and 105..116
It should be possible to come up with patterns where the choice of 'best
hit' complicates which hits are considered to overlap.
Probably writing a script is your best bet as you can then control which
hits are picked.
We could try to write an application to remove overlapping features ...
if someone can define how to select them. In this case, the mismatch
number will be stored as a tag (feature qualifier) in the feature table
and could be included in the selection criteria.
Hope this helps ... and maybe sparks some ideas
Peter Rice
EMBOSS Team
More information about the EMBOSS
mailing list