[Biopython] Find Sub-sequence with Variable positions

Ivan Gregoretti ivangreg at gmail.com
Mon Jul 8 15:37:09 UTC 2013


This is a way of doing it with Biopython's pairwise2.

from Bio import pairwise2

# set the parameters
reward    =   5
penalty   =  -4
gapopen   = -30
gapextend = -10


# specify the sequence (query) and the pattern (subject)
query = 'GTCGCGACGTTCGTACGTCGCGA'
subject = 'ACGTACGTACGT'

# run the pairwise aligner
qseq,sseq,score,start,end = pairwise2.align.localms(query ,subject,
reward, penalty, gapopen, gapextend)[0]

# see the aligned query sequence
qseq
'GTCGCGACGTTCGTACGTCGCGA'

# see the aligned subject sequence
sseq
'------ACGTACGTACGT-----'

# see score, start and end positions.
score
51.0

start
6

end
18

You can also BLAST 2 sequences from within Python if you need speed.

Hope this helps,

Ivan





Ivan Gregoretti, PhD






On Mon, Jul 8, 2013 at 10:06 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jul 8, 2013 at 2:19 PM, Jurgens de Bruin <debruinjj at gmail.com> wrote:
>> Hi,
>>
>> I hope someone can help me with the following:
>>
>> I want to find a sub-sequence within a sequence,but the catch is that the
>> sub-sequence contains positions that are variable and does not have to
>> match 100%.
>> For example:
>> if the following is the sub-sequence all the postions have to match but
>> position 5(A) can be any of the 4 bases ( ACGT ) within the query-seq.
>> ACGTACGTACGT
>>
>> Thanks!!!
>
> You could use a regular expression to do that - in Python, or at the
> command line with something like EMBOSS dreg or fuzzynuc:
>
> http://emboss.open-bio.org/rel/rel6/apps/dreg.html
> http://emboss.open-bio.org/rel/rel6/apps/fuzznuc.html
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list