[Biopython] allow ambiguities is sequence matching?
Cedar McKay
cmckay at u.washington.edu
Tue Nov 24 23:12:08 UTC 2009
Thanks for the advice, I'll check out that bug, and see what I see.
best,
Cedar
On Nov 20, 2009, at 2:03 AM, Peter wrote:
> On Thu, Nov 19, 2009 at 11:42 PM, Cedar McKay
> <cmckay at u.washington.edu> wrote:
>> Hello all,
>> Apologies if this is covered in the tutorial anywhere, if so I
>> didn't see
>> it.
>>
>> I am trying to test whether sequence A appears anywhere in sequence
>> B. The
>> catch is I want to allow n mismatches. Right now my code looks like:
>>
>> #record is a SeqRecord
>> #query_seq is a string
>> if query_seq in record.seq:
>> do something
>>
>>
>> If I want query_seq to match despite n nucleotide mismatches, how
>> should I
>> do that? It seems like something that would be pretty common for
>> people
>> working with DNA probes. Is this even a biopython problem? Or is it
>> just a
>> general python problem?
>
> We have in general tried to keep the Seq object API as much like
> that of
> the Python string as is reasonable, for example the find, startswith
> and
> endswith methos. Likewise, the "in" operator on the Seq object also
> works
> like a python string, it uses plain string matching (see Bug 2853,
> this was
> added in Biopython 1.51).
>
> It sounds like you want some kind of fuzzy find... one solution would
> be regular expressions, another might be to use the Bio.Motif module.
> There have been similar discussions on the mailing list before, but no
> clear consensus - see for example Bug 2601.
>
> Peter
More information about the Biopython
mailing list