[EMBOSS] non-overlapping matches in fuzznuc?
jison at ebi.ac.uk
Thu Oct 13 07:45:58 UTC 2011
Hi chaps (Aengus !)
If I understood Aengus' msg. what's needed is something that simply combines overlapping hits (for
a given pattern) into one or more non-overlapping "region of hits", and reports those regions e.g.
Start End Strand Pattern_name Mismatch Sequence
54 65 + pattern1 5 GCCAAATAAGGG
104 115 + pattern1 5 CCTAAATAAGGG
179 188 + pattern1 2 CCTTGCTTGG
190 200 + pattern1 6 CCGATTAGAGC
Mismatch in this case is reporting the sum of mismatches from before. A column for number of
(sub)matches would also be needed. Is that right Aengus?
The above might give a useful result depending in the input pattern. It would I think be easy
enough to implement.
> On 12/10/2011 16:50, Aengus Stewart wrote:
>> Hi Folks,
>> I couldnt see a command line option to do what I wanted ie return
>> non-overlapping hits.
>> This is best explained with some sample output.
>> # Sequence: chr1_174353258_174354335 from: 1 to: 200
>> # HitCount: 9
>> # Pattern_name Mismatch Pattern
>> # pattern1 3 CC[AT](6)GG
>> As you can see this is actually only 4 hits rather than the 9 reported.
> Hmmm ... with that kind of pattern and 3 mismatches there are pretty
> sure to be overlapping matches.
> Trouble is, which matches would you want to keep? Your second match, for
> example, has 2 hits with 1 mismatch at 104..115 and 105..116
> It should be possible to come up with patterns where the choice of 'best
> hit' complicates which hits are considered to overlap.
> Probably writing a script is your best bet as you can then control which
> hits are picked.
> We could try to write an application to remove overlapping features ...
> if someone can define how to select them. In this case, the mismatch
> number will be stored as a tag (feature qualifier) in the feature table
> and could be included in the selection criteria.
> Hope this helps ... and maybe sparks some ideas
> Peter Rice
> EMBOSS Team
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
More information about the EMBOSS