[EMBOSS] non-overlapping matches in fuzznuc?

Jon Ison jison at ebi.ac.uk
Thu Oct 13 07:45:58 UTC 2011


Hi chaps (Aengus !)

If I understood Aengus' msg. what's needed is something that simply combines overlapping hits (for
a given pattern) into one or more non-overlapping "region of hits", and reports those regions e.g.

   Start     End  Strand Pattern_name Mismatch Sequence
      54      65       + pattern1            5 GCCAAATAAGGG
     104     115       + pattern1            5 CCTAAATAAGGG
     179     188       + pattern1            2 CCTTGCTTGG
     190     200       + pattern1            6 CCGATTAGAGC

Mismatch in this case is reporting the sum of mismatches from before.  A column for number of
(sub)matches would also be needed.  Is that right Aengus?

The above might give a useful result depending in the input pattern.  It would I think be easy
enough to implement.

Cheers

Jon




> On 12/10/2011 16:50, Aengus Stewart wrote:
>> Hi Folks,
>>
>> I couldnt see a command line option to do what I wanted ie return
>> non-overlapping hits.
>>
>> This is best explained with some sample output.
>>
>> #=======================================
>> #
>> # Sequence: chr1_174353258_174354335 from: 1 to: 200
>> # HitCount: 9
>> #
>> # Pattern_name Mismatch Pattern
>> # pattern1 3 CC[AT](6)GG
>>
>> As you can see this is actually only 4 hits rather than the 9 reported.
>
> Hmmm ... with that kind of pattern and 3 mismatches there are pretty
> sure to be overlapping matches.
>
> Trouble is, which matches would you want to keep? Your second match, for
> example, has 2 hits with 1 mismatch at 104..115 and 105..116
>
> It should be possible to come up with patterns where the choice of 'best
> hit' complicates which hits are considered to overlap.
>
> Probably writing a script is your best bet as you can then control which
> hits are picked.
>
> We could try to write an application to remove overlapping features ...
> if someone can define how to select them. In this case, the mismatch
> number will be stored as a tag (feature qualifier) in the feature table
> and could be included in the selection criteria.
>
> Hope this helps ... and maybe sparks some ideas
>
> Peter Rice
> EMBOSS Team
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>





More information about the EMBOSS mailing list