[Biopython] align sequence to genomic DNA

Horea Chrristian h.chr at mail.ru
Fri Feb 27 17:42:21 UTC 2015


Hi again, I am trying to align a sequence to the mouse genome, I am
currently doing it thus:

```
#!/usr/bin/env python
from __future__ import division
__author__ = 'Horea Christian'

from Bio import Entrez
Entrez.email = "h.chr at mail.ru"
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

from Bio.Blast import NCBIWWW
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from Bio.Blast import NCBIXML

my_seq =
Seq("AATATGGCGAGGAACACTGAAAAATTGGAAAATTTANATGTCCACTGTANGACATGGAATATGGCAAGAAAACTGAAAATCATGGAAAATGANAAACATCCACGTGACGACTTGAAAAATGACGAAATCACTGAAAAACGTGAAAAATGANAAATGCACACTCTCGGACCTGGAATATGGCGAGAAAACTGAAAATCACGGAAAATGAGAAATACACACTTTANGACGTGAAATATGGANAGGAAAACTGAAAAAGTGGAAAATTTANAAATGTCCACTGTAGGACTGTTGGGAGCCGCGCGAGATATGGCA", generic_dna)

result_handle = NCBIWWW.qblast("blastn", "nt", my_seq)
blast_records = NCBIXML.read(result_handle)

E_VALUE_THRESH = 1e-100

for alignment in blast_records.alignments:
	for hsp in alignment.hsps:
		if hsp.expect < E_VALUE_THRESH:
			print('****Alignment****')
			print('sequence:', alignment.title)
			print('length:', alignment.length)
			print('e value:', hsp.expect)
			print(hsp.query[0:75] + '...')
			print(hsp.match[0:75] + '...')
			print(hsp.sbjct[0:75] + '...')
```

It seems to work quite well (you can run it yourselves), except of
course that I get huge numbers of high-confidence hits. No matter, my
issue right now is with using bipython. So how can I parse the position
of the matching sequence on the genome from my NCBIWWW.qblast results? I
have tried to look at the raw XML of resut_handle, but I was not able to
find any such tag? :-/ Can you help me out? Also, other suggestions
regarding how I currently use bipython are welcome!

Cheers, 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150227/e5a41c56/attachment.html>


More information about the Biopython mailing list