[Biopython] About BLAST parser

Peter biopython at maubp.freeserve.co.uk
Thu Oct 22 09:56:32 UTC 2009


On Thu, Oct 22, 2009 at 10:45 AM, Manu Tamminen <mavata at gmail.com> wrote:
> I have a problem with the Biopython BLAST parser. I'm using the parser to
> extract relevant information from an XML result file into a tab-separated
> table. It seems the XML file occasionally contains errors that cause the
> script to abort. This is especially common and annoying with sequence
> alignments that contain thousands of sequences.
>
> Is it possible to write the script so that when an error occurs, the script
> would jump into the next sequence rather than abort completely? I will
> include below an example of such error. This error is about a mismatched tag
> - sometimes the error has also been about a missing tag.
>
>    for blast_record in blast_records:
>  File
> "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Blast/NCBIXML.py",
> line 660, in parse
>    expat_parser.Parse(text, True) # End of XML record
> xml.parsers.expat.ExpatError: mismatched tag: line 82921, column 4

XML is a strict file format with tags like <item> having a closing
tag </item>. If the XML file is truncated or something, you can
have mismatched tags (e.g. an <item> without an  </item>) which
means the XML file is invalid. This is basically what that error
message is about.

I can make some suggestions that may help, but it first are you
running BLAST locally or online? Are you saving the results to
a file, or parsing directly from the handle? How many query
sequences do you have?

Peter




More information about the Biopython mailing list