[BioPython] BLAST Parser query

Peter biopython at maubp.freeserve.co.uk
Mon Feb 14 16:22:26 EST 2005


kumar s wrote:
> Hi group,

Hello!

> I used standalone BLAST to align my sequences to
> RefSeq database. 
> 
> I followed the sample code given in the cookbook.
> 
> Here is my code:

(code removed)

I would suggest "while True" rather than "while 1" to take advantage 
of the definition of these booleans in Python 2.3, but that is a 
style point.

You also call b_iterator.next() twice at the start of the program 
(once outside the loop, once on the first time through the loop) so 
you will waste/ignore the first result.

i.e. You should comment out the line "b_record = b_iterator.next()" 
before the while loop.

> I want to print my query along with my
> alignment.title, because I am blasting 1000 sequences
> against RefSeq and I want to know what the query. 

I assume you are blasting with an input FASTA file with 1000 
sequences it it.

When you say "query" what do you mean?  I can think of three answers:

(a) The name of the original query:

b_record.query

(b) The ENTIRE original query (i.e. a complete sequence of 
nucleotides or amino acids) is not included in the blast output 
(look inside your file 'xxx_blast.out').

You will have to use the query name (i.e. b_record.query) to work 
this out, for example by searching your FASTA input file.

(c) The FRAGMENT of the original query (i.e. a sequence of 
nucleotides or amino acids) that was matched to the Blast database 
is available, as:

hsp.query.replace('-','')

Here hsp.query gives the string including gap characters '-' to show 
how this fragment of the query string aligns with the database match 
(see also the hsp.match and hsp.sbjct properties).

Doing hsp.query.replace('-','') gives the fragment of the query 
string without the gap characters.

e.g.

from Bio.Blast import NCBIStandalone
b_out = open('xxx_blast.out','r')
b_parser = NCBIStandalone.BlastParser()
b_iterator = NCBIStandalone.Iterator(b_out,b_parser)

#Do use the following line, as you end up calling
#next twice and never look at the first result!
#b_record = b_iterator.next()

#You only need to define the threshold once
E_VALUE_THRESH = 0.4

while True:
     b_record = b_iterator.next()

     if b_record is None:
         break

     print "The following results are for query " + b_record.query

     for alignment in b_record.alignments:
         for que in b_record.query:
             for hsp in alignment.hsps:
                 if hsp.expect < E_VALUE_THRESH:
                     print '****Alignment****'
                     print 'title:', alignment.title
                     print 'length:', alignment.length
                     print 'e value:', hsp.expect
                     print 'fragment of query matched:'
                     print hsp.query.replace('-','')

I hope the indentation survives being emailed!

Peter


More information about the BioPython mailing list