[BioPython] comparing short sequences against genome
Bzy Bee
nomy2020 at yahoo.com
Sun Sep 26 22:20:57 EDT 2004
Hi BioPython gurus
I have been working on this problem of mine, but am
stuck and was wondering if someone could help me out
here.
The following code works on a file with sequences in
fasta format (as I can download the exonic sequences
in a fasta format). It takes first ten bases of a
sequence (as a forward oligo sequence) and then
compares it with the rest of sequence, while the 10
mers at a distance of 290 bases are taken as a reverse
oligo. It then calculates the gc content and prints
the details.
What I have been unable to do so far is:
1) the oligos (both forward and reverse) to iterate
through the entire file, i.e. each and every sequence
in the file (and keeping track of sequence names and
positions when a match is found). At the moment it
just takes the first 10 mers of sequence 1 (and 10
mers at position 290) and compares these with sequence
1, but not with sequence 2 and 3 and so on
2) secondly, I want to add one to the starting
position of 10 mers, i.e. in the second round of
iteration, instead of taking:
oligoF = result[0:10]
it should take result[1:11], i.e. increasing the
position of 10 mer by 1 (and same for reverse oligo)
and so on until teh sequence finishes. I'm not sure
how to increment the result by 1.
I am kinda stuck at both these steps and any help
would be very much appreciated.
Here's my code:
=============================================
from string import *
import string
from Bio.SeqIO import FASTA
def findOlig(dna, prim):
''' This method finds all the occurences of a
short 10 mer in a sequence'''
loc = []
count = 0
site = dna.find(prim)
while site != -1:
loc.append(site)
site = dna.find(prim, site + 1)
count += 1
print 'primer sequence: ', prim, 'no of times
present:', count, 'oligo site:', loc
def revComp(dna):
''' This method is used to get the reverse
complement of a sequence'''
comp = dna.translate(maketrans("AGCTagct",
"TCGAtcga"))
lcomp = list(comp)
lcomp.reverse()
return join(lcomp, '')
def gcContent(dna):
'''This method finds gc percent of a given
sequence '''
length = len(dna)
count_gc = count(dna, 'g') + count(dna, 'G') +
count(dna, 'c') + count(dna, 'C')
gc = float(count_gc)/length *100
return gc
# read the fasta file and find the oligos (and print
their details) in the sequences
handle = open('some_file.txt')
it = FASTA.FastaReader(handle)
#seq = it.next()
while 1:
seq = it.next()
if seq is None:
break
result = string.join(myseq.seq, '')
oligoF = result[0:10]
oligoR = result[290:300]
findOlig(result, oligoF)
findOlig(result, oligoR)
print oligoF, "GC content:", gcContent(oligoF)
print revComp(oligoR), "GC content:",
gcContent(oligoR)
print myseq.name
print result
handle.close()
===========================================
where some_file.txt has:
>Seq1 There is some description here
ATGAATCCAAGCCAAATACTTGAAAATTTAAAAAAAGAATTAAGTGAAAACGAATACGAAAACTATTTATCAAATTTAAAATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTTAATGCTCCAAATGAACTCATGGCTAAATTCATACAAACAAAATACGGCAAAAAAATCGCGCATTTTTATGAAGTGCAAAGCGGAAATAAAGCCATCATAAATATACAAGCACAAAGTGCTAAACAAAGCAACAAAAGCACAAAAATCGACATAGCTCATATAAAAGCACAAAGCACGATTTTAAATCCTTCTTTTACTTTTGAAAGTTTTGTTGTAGGGGATTCTAACAAATACGCTTATGGAGCATGTAAAGCCATAGCACATAAAGACAAACTTGGAAAACTTTATAATCCAATCTTTGTTTATGGACCTACAGGACTTGGAAAAACACATTTACTTCAAGCAGTTGGAAATGCAAGCTTAGAAATGGGAAAAAAAGTTATTTACGCTACCAGTGAAAATTTCATCAACGATTTTACTTCAAATTTAAAAAATGGTTCTTTAGATAAATTTCATGAAAAGTATAGAAACTGCGATGTTTTACTTATAGATGATGTACAGTTTTTAGGAAAAACCGATAAAATTCAAGAAGAATTTTTCTTTATATTTAATGAAATCAAAAATAACGATGGACAAATCATCATGACTTCAGACAATCCACCCAACATGCTAAAAGGTATAACCGAACGCTTAAAAAGTCGTTTTGCACATGGGATCATAGCTGATATAACTCCACCTCAACTAGATACAAAAATAGCCATCATAAGAAAAAAATGTGAATTTAACGATATCAATCTTTCTAATGATATTATAAACTATATCGCTACTTCTTTAGGGGATAATATAAGAGAAATCGAAGGTATCATCATAAGTTTAAATGCTTATGCAACCATACTAGGACAAGAAATCACACTCGAACTTGCCAAAAGTGTGATGAA
AGATCATATCAAAGAAAAGAAAGAAAATATCACTATAGATGACATTTTATCTTTGGTATGTAAAGAATTTAACATCAAACCAAGCGATGTGAAATCCAATAAAAAAACTCAAAATATAGTCACAGCAAGACGCATTGTGATTTACCTAGCTAGGGCACTTACGGCTTTGACTATGCCACAACTTGCGAATTATTTTGAAATGAAAGATCATACAGCTATTTCACATAATGTTAAAAAAATCACAGAAATGATAGAAAATGATGCTTCTTTAAAAGCAAAAATCGAAGAACTTAAAAACAAAATTCTTGTTAAAAGTCAAAGTTAA
>seq2 There is some description here
ATGAAGTTAAGTATCAATAAAAATACTTTAGAATCTGCAGTGATTTTATGTAATGCTTATGTAGAAAAAAAAGACTCAAGCACCATTACTTCTCATCTTTTTTTTCATGCTGATGAAGATAAACTTCTTATTAAAGCTAGTGATTATGAAATAGGTATCAACTATAAAATAAAAAAAATCCGCGTAGAATCAAGTGGTTTTGCTACTGCAAATGCAAAAAGTATTGCAGATGTTATTAAAAGCTTAAACAATGAAGAAGTTGTTTTAGAAACCATTGATAATTTTTTATTTGTAAGACAAAAAAGTACAAAATACAAACTTCCTATGTTTAATCATGAAGATTTTCCAAATTTTCCAAATACAGAAGGAAAAAACCAATTTGACATTGATTCAAGTGATTTAAGCCGTTCTCTTAAAAAGATATTACCAAGTATTGATACAAATAACCCAAAATACTCCTTAAATGGTGCATTTTTAGATATAAAAACAGATAAAATTAACTTCGTAGGAACTGATACAAAACGCCTTGCAATCTATACTTTAGAAAAAGCAAATAATCAAGAATTTAGTTTTAGTATCCCTAAAAAAGCTATTATGGAAATGCAAAAACTTTTCTATGAAAAAATAGAAATTTTTTATGATCAAAATATGCTTATTGCCAAAAATGAAAATTTTGAATTCTTTACAAAACTTATCAATGATAAATTTCCAGATTATGAAAAAGTTATACCAAAAACTTTCAAACAAGAACTCAGTTTTTCAACTGAAGATTTTATAGATAGTCTTAAAAAAATCAGCGTTGTAACTGAAAAAATGAGACTTCATTTTAACAAAGATAAAATCATCTTTGAAGGTATAAGTTTAGACAATATGGAAGCAAAAACAGAACTTGAAATTCAAACAGGAGTAAGTGAAGAATTTAATCTTACTATAAAAATCAAACATTTACTTGATTTCTTAACTTCTATAGAAGAAGAAAAATTCACTTTAAGTGTAAA
TGAACCTAATTCAGCATTTATAGTCAAATCCCAAGGACTATCAATGATTATCATGCCTATGATTTTGTAA
Thanks a lot for your help
regards
J Ali
--- Bzy Bee <nomy2020 at yahoo.com> wrote:
> Subject: Re: [BioPython] comparing short sequences
> against genome
> Date: Mon, 20 Sep 2004
> Hi everyone
>
> I want to design 15-20 primers for a differential
> display experiment on a bacterial genome. The idea
> is take say 10-15 mer of sequence (from the genome)
> and compare it against the rest to see how many
> times it occurs in the genome, followed by next 10 >
mer and so on.
>
> Is there anything in biopython that could help me
> in doing this?
>
> thanks
>
> JA
>
> Do You Yahoo!?
__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail
More information about the BioPython
mailing list