[BioPython] NCBIXML for multiple queries
David Weisman
weisman at lydon.com
Mon Jan 16 10:50:26 EST 2006
Hello,
I tried using NCBIXML parsing on a local blast run, in which the input had multiple
query sequences. Blastall writes multiple xml documents to the output file, and the
SAX parser threw a SAXParseException on the second <?xml...> declaration, complaining
of junk after the document element.
I couldn't find an obvious workaround, so I wrote a python generator function that
returns a new file handle (based on a CStringIO) for each xml document in the stream.
The usage model is:
import xmlStreamSeparator # new
blastInFile = open (blastInPath, "r") # composite blast output
x_gen=xmlStreamSeparator.getXmlDoc(blastInFile)
x_doc=x_gen.next()
while not xmlStreamSeparator.xmlStreamEOF(x_doc):
iter=NCBIStandalone.Iterator(x_doc, NCBIXML.BlastParser())
for b_rec in iter:
process blast record...
x_doc=x_gen.next() # get next xml doc from stream
Any pointers to a better model? Many thanks for any tips.
Regards,
David
More information about the BioPython
mailing list