[EMBOSS] non-overlapping matches in fuzznuc?

Aengus Stewart aengus.stewart at cancer.org.uk
Thu Oct 13 09:31:56 UTC 2011


So Peter is right about what I want returned - the best match, but of course has pointed out the problem with having 2 best matches for the same region ( in this example 104-113, 105-114).  However, it is still the case that the "real" result is 4 hits rather than 9.

I dont know if my example is a special case or not so it would be good as Peter suggests if someone else has used fuzznuc in a similar way.  Though surely if you include any mismatch at all for your pattern search then you automatically have this scenario of returning multiple results for the same location?


Cheers
Aengus







On 13/10/11 09:44, Peter Rice wrote:
> On 13/10/2011 08:45, Jon Ison wrote:
>> Hi chaps (Aengus !)
>>
>> If I understood Aengus' msg. what's needed is something that simply combines overlapping hits (for
>> a given pattern) into one or more non-overlapping "region of hits", and reports those regions e.g.
>>
>>      Start     End  Strand Pattern_name Mismatch Sequence
>>         54      65       + pattern1            5 GCCAAATAAGGG
>>        104     115       + pattern1            5 CCTAAATAAGGG
>>        179     188       + pattern1            2 CCTTGCTTGG
>>        190     200       + pattern1            6 CCGATTAGAGC
>>
>> Mismatch in this case is reporting the sum of mismatches from before.  A column for number of
>> (sub)matches would also be needed.  Is that right Aengus?
>
> I'm not sure that adding the mismatches is sound. I'd assume just a best
> hit from the overlapping matches.
>
>> The above might give a useful result depending in the input pattern.  It would I think be easy
>> enough to implement.
>
> This is a report output, so post-processing could be done by trimming
> the results before output using an associated qualifier.
>
> Still not sure how useful it would be, we need more feedback from other
> users on this one please!
>
> Peter Rice
> EMBOSS Team
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss


-- 
-----------------------------------------------------------------------
Aengus Stewart                                 Tel: +44 (0)20 7269 3679
Head of Bioinformatics and BioStatistics
CRUK London Research Institute
Lincoln's Inn Fields, Holborn, London, WC2A 3LY, UK
-----------------------------------------------------------------------

This electronic message contains information which may be privileged and
confidential.  The information is intended to be for the use of the
individual(s) or entity named above. Be aware that any third party
disclosure, distribution, copying or use of this communication, without
prior permission, is strictly prohibited.

NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. 

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. 
Cancer Research UK
Registered in England and Wales
Company Registered Number: 4325234.
Registered Charity Number: 1089464 and Scotland SC041666
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.



More information about the EMBOSS mailing list