[BioPython] HSPs in Blast parser
Brad Chapman
chapmanb at uga.edu
Wed May 5 14:56:19 EDT 2004
Hi Jawad;
Sorry for the delay in getting back with you.
> I am stuck on parsing a BlastN output and would appreciate some help.
> I am working on multiple HSPs for a single hit . For example if there
> are two hsps found for one hit, I need to find where query and subject
> ends for one hsp and then compare it with the query and subject start
> for the next hsp
>
> I have noticed that in the blast parser one can iterate through each
> hsp for every single hit, but am not too sure how to treat two hsps of
> a single hit as related and iterate through the two hsps of a single hit
> in order to find the query (and subject) end of one and query (and subject)
> start of the other.
You just need to iterate through each alignment and then through the
HSPs in each alignment. Then you can collect up the start and end
coordinates of each HSP and associate them with the overall
alignment. So, step by step:
1. Iterate through each alignment.
2. Create lists of HSP information for queries and subjects
2a. Iterate through each HSP
2b. Get the start and end coordinates from the hsp.query_start and
hsp.sbjct_start attributes of the hsps
2c. Calculate the HSP ends using the query and subject
information, removing gaps added by BLAST
2d. Add each start, end to the lists
3. You have the HSP lists, associated with the alignment title.
Here is some code which demonstrates this:
from Bio.Blast import NCBIStandalone
parser = NCBIStandalone.BlastParser()
rec = parser.parse(open("blast.out"))
for align in rec.alignments:
query_hsps = []
sbjct_hsps = []
for hsp in align.hsps:
# adjust query starts to python (0-based) coordinates
query_start = hsp.query_start - 1
# get the end from the length minus gaps
query_end = query_start + len(hsp.query.replace("-", ""))
sbjct_start = hsp.sbjct_start - 1
sbjct_end = sbjct_start + len(hsp.sbjct.replace("-", ""))
query_hsps.append((query_start, query_end))
sbjct_hsps.append((sbjct_start, sbjct_end))
print "Hit", align.title
print "Query HSPs", query_hsps
print "Subject HSPs", sbjct_hsps
Hope this helps!
Brad
More information about the BioPython
mailing list