[BioPython] Parsing blast.out
Ravinder Singh
Ravinder.Singh@colorado.edu
Tue, 14 May 2002 13:05:21 -0600
Hi,
I'm trying to parse a blast output file and have tried both ways - i.e
saving to a file then making a file handle or doing the
cStringIO.StringIO.
I get the following error. Any help. Many thanks
Ravinder
*******
------------------------------------------------------------
SyntaxError: Expected blank line, but got:
1,221,820 sequences; 5,507,506,871 total letters
--------------------------------------------------------------
I know that the blast works as it writes the blast output to a file. It
gets stuck at the parsing . The problem occurs when I generate the
b_record, using either handle. If I comment b_record1 line it prints
neither C not D, however, if I comment b_record2 it printc C not D,
b_record1 = blast_parser.parse(b_results)
print 'C'
b_record2 = blast_parser.parse(string_result_handle)
print 'D'
****************
If needed, my code is,
----------------------------------------------------------------
#! /usr/local/bin/python
from Bio import Fasta
file_for_blast = open('m_cold.fasta', 'r')
f_iterator = Fasta.Iterator(file_for_blast)
f_record = f_iterator.next()
from Bio.Blast import NCBIWWW
b_results = NCBIWWW.blast('blastn', 'nr', f_record)
save_file = open('my_blast.out', 'w')
blast_results = b_results.read()
save_file.write(blast_results)
save_file.close()
import cStringIO
string_result_handle = cStringIO.StringIO(blast_results)
b_results = open('my_blast.out', 'r')
print 'A'
from Bio.Blast import NCBIWWW
blast_parser = NCBIWWW.BlastParser()
print 'B'
b_record = blast_parser.parse(b_results)
print 'C'
b_record = blast_parser.parse(string_result_handle)
print 'D'
*******************
I'd like to do all of the following if and when the above code works.
E_VALUE_THRESH = 0.04
for alignment in b_record.alignments:
for hsp in alignment.hsps:
if hsp.expect < E_VALUE_THRESH:
print '****Alignment****'
print 'sequence:', alignment.title
print 'length:', alignment.length
print 'e value:', hsp.expect
print hsp.query[0:75] + '...'
print hsp.match[0:75] + '...'
print hsp.sbjct[0:75] + '...'
--
********************************************************************************
Dr. Ravinder Singh
Assistant Professor
MCD Biology
347 UCB
University of Colorado
Boulder, CO 80309-0347
(303)492-8886 (voice)
(303)492-7744 (fax)