[Biopython] Find Sub-sequence with Variable positions
Ivan Gregoretti
ivangreg at gmail.com
Mon Jul 8 15:37:09 UTC 2013
This is a way of doing it with Biopython's pairwise2.
from Bio import pairwise2
# set the parameters
reward = 5
penalty = -4
gapopen = -30
gapextend = -10
# specify the sequence (query) and the pattern (subject)
query = 'GTCGCGACGTTCGTACGTCGCGA'
subject = 'ACGTACGTACGT'
# run the pairwise aligner
qseq,sseq,score,start,end = pairwise2.align.localms(query ,subject,
reward, penalty, gapopen, gapextend)[0]
# see the aligned query sequence
qseq
'GTCGCGACGTTCGTACGTCGCGA'
# see the aligned subject sequence
sseq
'------ACGTACGTACGT-----'
# see score, start and end positions.
score
51.0
start
6
end
18
You can also BLAST 2 sequences from within Python if you need speed.
Hope this helps,
Ivan
Ivan Gregoretti, PhD
On Mon, Jul 8, 2013 at 10:06 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jul 8, 2013 at 2:19 PM, Jurgens de Bruin <debruinjj at gmail.com> wrote:
>> Hi,
>>
>> I hope someone can help me with the following:
>>
>> I want to find a sub-sequence within a sequence,but the catch is that the
>> sub-sequence contains positions that are variable and does not have to
>> match 100%.
>> For example:
>> if the following is the sub-sequence all the postions have to match but
>> position 5(A) can be any of the 4 bases ( ACGT ) within the query-seq.
>> ACGTACGTACGT
>>
>> Thanks!!!
>
> You could use a regular expression to do that - in Python, or at the
> command line with something like EMBOSS dreg or fuzzynuc:
>
> http://emboss.open-bio.org/rel/rel6/apps/dreg.html
> http://emboss.open-bio.org/rel/rel6/apps/fuzznuc.html
>
> Peter
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list