[BioPython] comparing short sequences against genome

Sun Sep 26 22:20:57 EDT 2004

Hi BioPython gurus

I have been working on this problem of mine, but am
stuck and was wondering if someone could help me out
here.

The following code works on a file with sequences in
fasta format (as I can download the exonic sequences
in a fasta format). It takes first ten bases of a
sequence (as a forward oligo sequence) and then
compares it with the rest of sequence, while the 10
mers at a distance of 290 bases are taken as a reverse
oligo. It then calculates the gc content and prints
the details. 

What I have been unable to do so far is:

1) the oligos (both forward and reverse) to iterate
through the entire file, i.e. each and every sequence
in the file (and keeping track of sequence names and
positions when a match is found). At the moment it
just takes the first 10 mers of sequence 1 (and 10
mers at position 290) and compares these with sequence
1, but not with sequence 2 and 3 and so on

2) secondly, I want to add one to the starting
position of 10 mers, i.e. in the second round of
iteration, instead of taking:
     oligoF = result[0:10] 
it should take result[1:11], i.e. increasing the
position of 10 mer by 1 (and same for reverse oligo)
and so on until teh sequence finishes. I'm not sure
how to increment the result by 1.

I am kinda stuck at both these steps and any help
would be very much appreciated.

Here's my code:

=============================================
from string import *
import string
from Bio.SeqIO import FASTA

def findOlig(dna, prim):
    ''' This method finds all the occurences of a
short 10 mer in a sequence'''
    loc = []
    count = 0

    site = dna.find(prim)
    while site != -1:
        loc.append(site)
        site = dna.find(prim, site + 1)
        count += 1
    print 'primer sequence: ', prim, 'no of times
present:', count, 'oligo site:', loc

def revComp(dna):
    ''' This method is used to get the reverse
complement of a sequence'''
    comp = dna.translate(maketrans("AGCTagct",
"TCGAtcga"))
    lcomp = list(comp)
    lcomp.reverse()

    return join(lcomp, '')

def gcContent(dna):
    '''This method finds gc percent of a given
sequence '''
    length = len(dna)
    count_gc = count(dna, 'g') + count(dna, 'G') +
count(dna, 'c') + count(dna, 'C')
    gc = float(count_gc)/length *100
    return gc

# read the fasta file and find the oligos (and print
their details) in the sequences
handle = open('some_file.txt')
it = FASTA.FastaReader(handle)
#seq = it.next()

while 1:
    seq = it.next()
    if seq is None:
        break
    result = string.join(myseq.seq, '')

    oligoF = result[0:10]
    oligoR = result[290:300]
    findOlig(result, oligoF)
    findOlig(result, oligoR)

    print oligoF, "GC content:", gcContent(oligoF)
    print revComp(oligoR), "GC content:",
gcContent(oligoR)
    print myseq.name
    print result

handle.close()
===========================================

where some_file.txt has:

>Seq1 There is some description here
ATGAATCCAAGCCAAATACTTGAAAATTTAAAAAAAGAATTAAGTGAAAACGAATACGAAAACTATTTATCAAATTTAAAATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTTAATGCTCCAAATGAACTCATGGCTAAATTCATACAAACAAAATACGGCAAAAAAATCGCGCATTTTTATGAAGTGCAAAGCGGAAATAAAGCCATCATAAATATACAAGCACAAAGTGCTAAACAAAGCAACAAAAGCACAAAAATCGACATAGCTCATATAAAAGCACAAAGCACGATTTTAAATCCTTCTTTTACTTTTGAAAGTTTTGTTGTAGGGGATTCTAACAAATACGCTTATGGAGCATGTAAAGCCATAGCACATAAAGACAAACTTGGAAAACTTTATAATCCAATCTTTGTTTATGGACCTACAGGACTTGGAAAAACACATTTACTTCAAGCAGTTGGAAATGCAAGCTTAGAAATGGGAAAAAAAGTTATTTACGCTACCAGTGAAAATTTCATCAACGATTTTACTTCAAATTTAAAAAATGGTTCTTTAGATAAATTTCATGAAAAGTATAGAAACTGCGATGTTTTACTTATAGATGATGTACAGTTTTTAGGAAAAACCGATAAAATTCAAGAAGAATTTTTCTTTATATTTAATGAAATCAAAAATAACGATGGACAAATCATCATGACTTCAGACAATCCACCCAACATGCTAAAAGGTATAACCGAACGCTTAAAAAGTCGTTTTGCACATGGGATCATAGCTGATATAACTCCACCTCAACTAGATACAAAAATAGCCATCATAAGAAAAAAATGTGAATTTAACGATATCAATCTTTCTAATGATATTATAAACTATATCGCTACTTCTTTAGGGGATAATATAAGAGAAATCGAAGGTATCATCATAAGTTTAAATGCTTATGCAACCATACTAGGACAAGAAATCACACTCGAACTTGCCAAAAGTGTGATGAA
AGATCATATCAAAGAAAAGAAAGAAAATATCACTATAGATGACATTTTATCTTTGGTATGTAAAGAATTTAACATCAAACCAAGCGATGTGAAATCCAATAAAAAAACTCAAAATATAGTCACAGCAAGACGCATTGTGATTTACCTAGCTAGGGCACTTACGGCTTTGACTATGCCACAACTTGCGAATTATTTTGAAATGAAAGATCATACAGCTATTTCACATAATGTTAAAAAAATCACAGAAATGATAGAAAATGATGCTTCTTTAAAAGCAAAAATCGAAGAACTTAAAAACAAAATTCTTGTTAAAAGTCAAAGTTAA
>seq2 There is some description here
ATGAAGTTAAGTATCAATAAAAATACTTTAGAATCTGCAGTGATTTTATGTAATGCTTATGTAGAAAAAAAAGACTCAAGCACCATTACTTCTCATCTTTTTTTTCATGCTGATGAAGATAAACTTCTTATTAAAGCTAGTGATTATGAAATAGGTATCAACTATAAAATAAAAAAAATCCGCGTAGAATCAAGTGGTTTTGCTACTGCAAATGCAAAAAGTATTGCAGATGTTATTAAAAGCTTAAACAATGAAGAAGTTGTTTTAGAAACCATTGATAATTTTTTATTTGTAAGACAAAAAAGTACAAAATACAAACTTCCTATGTTTAATCATGAAGATTTTCCAAATTTTCCAAATACAGAAGGAAAAAACCAATTTGACATTGATTCAAGTGATTTAAGCCGTTCTCTTAAAAAGATATTACCAAGTATTGATACAAATAACCCAAAATACTCCTTAAATGGTGCATTTTTAGATATAAAAACAGATAAAATTAACTTCGTAGGAACTGATACAAAACGCCTTGCAATCTATACTTTAGAAAAAGCAAATAATCAAGAATTTAGTTTTAGTATCCCTAAAAAAGCTATTATGGAAATGCAAAAACTTTTCTATGAAAAAATAGAAATTTTTTATGATCAAAATATGCTTATTGCCAAAAATGAAAATTTTGAATTCTTTACAAAACTTATCAATGATAAATTTCCAGATTATGAAAAAGTTATACCAAAAACTTTCAAACAAGAACTCAGTTTTTCAACTGAAGATTTTATAGATAGTCTTAAAAAAATCAGCGTTGTAACTGAAAAAATGAGACTTCATTTTAACAAAGATAAAATCATCTTTGAAGGTATAAGTTTAGACAATATGGAAGCAAAAACAGAACTTGAAATTCAAACAGGAGTAAGTGAAGAATTTAATCTTACTATAAAAATCAAACATTTACTTGATTTCTTAACTTCTATAGAAGAAGAAAAATTCACTTTAAGTGTAAA
TGAACCTAATTCAGCATTTATAGTCAAATCCCAAGGACTATCAATGATTATCATGCCTATGATTTTGTAA

Thanks a lot for your help

regards

J Ali

--- Bzy Bee <nomy2020 at yahoo.com> wrote:

> Subject: Re: [BioPython] comparing short sequences
> against genome
> Date: Mon, 20 Sep 2004
> Hi everyone
>
> I want to design 15-20 primers for a differential
> display experiment on a bacterial genome. The idea
> is take say 10-15 mer of sequence (from the genome)
> and compare it against the rest to see how many    
> times it occurs in the genome, followed by next 10 >
mer and so on.
>
> Is there anything in biopython that could help me
> in doing this?
>
> thanks
>
> JA
>
> Do You Yahoo!?

__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail