[BioPython] WU-BLAST
Brad Chapman
chapmanb at uga.edu
Wed Jun 2 12:19:38 EDT 2004
Hi Micheal;
> Does anyone have some example code for parsing WU-BLAST output? I see
> the Bio.expressions["wu-blastn"] but I'm not really sure how to use
> it...
Yes, there's only a Martel expression for it and not any other kind
of code, so you'll need to build up something yourself. Basically,
Martel will do the parsing for you and give you XML output, which
you'll need to deal with. There are two ways to do this. One is to
write a XML handler (warning, I wrote all this code for ncbiblast
since I don't have any wublast files around, but it should translate
directly).
So, first you'd have to write a little handler, in this case, we'll
just get the database name out of the file for something simple:
from xml.sax import handler
class SimpleDatabaseHandler(handler.ContentHandler):
"""Only stores the name of the database in each record.
"""
def __init__(self):
handler.ContentHandler.__init__(self)
def startDocument(self):
self.database = None
def startElement(self, name, attrs):
self._cur_text = []
def characters(self, content):
self._cur_text.append(content)
def endElement(self, name):
if name == 'bioformat:database_name':
self.database = "".join(self._cur_text)
simple_handler = SimpleDatabaseHandler()
With this handler in place, then you can use it to parse a file full
of BLAST records with the expression:
from Bio.expressions.blast import ncbiblast
iterator_builder = ncbiblast.blastp.make_iterator("record", debug_level = 0)
iterator = iterator_builder.iterateFile(open("bt001"), simple_handler)
for handler in iterator:
print handler.database
For my example (from the Biopython test directory), this prints:
data/swissprot
The second way, if you don't want to build your own handler, is to use
the LAX handler supplied with Martel. This handler will just turn
the XML into a dictionary where the keys are item names and the
values are the information. This is good if you are going for
something simple, like our database name example, but doesn't work
if you need nested information and need to know the context. But,
for this example, the following code should do it:
from Martel import LAX
iterator = iterator_builder.iterateFile(open("bt001"), LAX.LAX())
for lax_dict in iterator:
print lax_dict["bioformat:database_name"]
This prints out:
['data/swissprot']
Okay, so yes, you'll have to build an XML handler, one way or
another, but hopefully the Martel bits in place will make it easier.
Hope this helps and answers your question.
Brad
More information about the BioPython
mailing list