[Biopython] allow ambiguities is sequence matching?

Cedar McKay cmckay at u.washington.edu
Tue Nov 24 23:12:08 UTC 2009


Thanks for the advice, I'll check out that bug, and see what I see.
best,
Cedar

On Nov 20, 2009, at 2:03 AM, Peter wrote:

> On Thu, Nov 19, 2009 at 11:42 PM, Cedar McKay  
> <cmckay at u.washington.edu> wrote:
>> Hello all,
>> Apologies if this is covered in the tutorial anywhere, if so I  
>> didn't see
>> it.
>>
>> I am trying to test whether sequence A appears anywhere in sequence  
>> B. The
>> catch is I want to allow n mismatches. Right now my code looks like:
>>
>> #record is a SeqRecord
>> #query_seq is a string
>> if query_seq in record.seq:
>>        do something
>>
>>
>> If I want query_seq to match despite n nucleotide mismatches, how  
>> should I
>> do that? It seems like something that would be pretty common for  
>> people
>> working with DNA probes. Is this even a biopython problem? Or is it  
>> just a
>> general python problem?
>
> We have in general tried to keep the Seq object API as much like  
> that of
> the Python string as is reasonable, for example the find, startswith  
> and
> endswith methos. Likewise, the "in" operator on the Seq object also  
> works
> like a python string, it uses plain string matching (see Bug 2853,  
> this was
> added in Biopython 1.51).
>
> It sounds like you want some kind of fuzzy find... one solution would
> be regular expressions, another might be to use the Bio.Motif module.
> There have been similar discussions on the mailing list before, but no
> clear consensus - see for example Bug 2601.
>
> Peter




More information about the Biopython mailing list