[BioPython] parsing the blastoutput and printing the alingment
Muthuraman, Manickam
manickam.muthuraman at wur.nl
Thu Jun 15 14:13:17 UTC 2006
Dear peter
here is the code my_blast.out and the error. My need is to get all the blast hit sequences in fasta format. By parsing and i can extract accession number from it.
Code
from Bio import Fasta
file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r')
f_iterator=Fasta.Iterator(file_for_blast)
f_record=f_iterator.next()
from Bio.Blast import NCBIWWW
result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML")
save_file=open('/home/manickam/my_blast.out','w')
blast_results=result_handle.read()
save_file.write(blast_results)
save_file.close()
blast_out=open('/home/manickam/my_blast.out','r')
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIStandalone
b_parser=NCBIXML.BlastParser()
b_iterator=NCBIStandalone.Iterator(blast_out,b_parser)
b_record = b_iterator.next()
for alignment in b_record.alignments:
print "inside 2 loop"
for hsp in alignment.hsps:
print "inside 1 loop"
print 'seq:',alignment.title
blast_out.close()
Error
[root at bioinfo python]# python blas.py
/usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now.
warnings.warn("qblast works only with blastn and blastp for now.")
Traceback (most recent call last):
File "blas.py", line 16, in ?
b_record = b_iterator.next()
File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1410, in next
return self._parser.parse(File.StringHandle(data))
File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
self._parser.parse(handler)
File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:4: not well-formed (invalid token)
[root at bioinfo python]#
my_blast.out
HTTP/1.1 200 OK
Date: Thu, 15 Jun 2006 13:57:19 GMT
Server: Nde
Content-Type: application/xml
Connection: close
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
<BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>BLASTP 2.2.14 [May-07-2006]</BlastOutput_version>
<BlastOutput_reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nr</BlastOutput_db>
<BlastOutput_query-ID>1_13944</BlastOutput_query-ID>
<BlastOutput_query-def>1BK0</BlastOutput_query-def>
<BlastOutput_query-len>331</BlastOutput_query-len>
<BlastOutput_param>
.
.
.
.
.
.
.
.
<Hsp_identity>76</Hsp_identity>
<Hsp_positive>128</Hsp_positive>
<Hsp_gaps>27</Hsp_gaps>
<Hsp_align-len>295</Hsp_align-len>
<Hsp_qseq>VPKIDVSPLFGD-DQAAKMRVAQQIDAASRDTGFFYAVNHGIN---VQRLSQKTKEFHMSITPEEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNP--NFTPDHPRIQAKTPTHEVNVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVVLIRYP-YLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEADDTGYLINCGSYMAHLTNNYYKAPIHRV--KWVNAERQSLPFFVNLGYDSVI</Hsp_qseq>
<Hsp_hseq>LPVIDLSLLDGSPESAAKFR--DDLLCATHDVGFFYLVGHGVDESLMDDLLAASREFFD--LPEDQKFAVENVKSPQFRGYTRVGGELT-EGKTDWREQIDVGPERDVIDNAPGLADYWRLEGPNLWPDAV--PQLRGLVNEWNDKLSAVSLRLLRAWAHALGAPEDVFDNAFA-DKPFPQLKIVRYPGESNPEPKQGVGAHRDGGVLTL----------LMVEPGKGGLQVDYNGEWVDVPPKPGAFVVNIGEMLELATEGYLKATLHRVISPLIGDDRISIPFFFNPALDTVM</Hsp_hseq>
<Hsp_midline>+P ID+S L G + AAK R + A+ D GFFY V HG++ + L ++EF PE++ + + + R G L+ GK + P + + P + N+WPD P + ++ + +S LL+ +A ALG E+ F F D + ++RYP +P P+ + DG L+ ++ + LQV+ + D+ +++N G + T Y KA +HRV + +R S+PFF N D+V+</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>3695564</Statistics_db-num>
<Statistics_db-len>1269795892</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.041</Statistics_kappa>
<Statistics_lambda>0.267</Statistics_lambda>
<Statistics_entropy>0.14</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
More information about the Biopython
mailing list