[Biopython] About BLAST parser
Manu Tamminen
mavata at gmail.com
Thu Oct 22 10:06:47 UTC 2009
Hi Peter! Thanks for your prompt reply! I've run the BLAST analysis on
a supercomputer cluster, saved the results into a XML file and then
transferred the output file to my computer. I then run the script on
my computer to parse the results into a tab separated file. With the
current dataset I have 1115 sequences of around 500 bp each.
Manu
On Oct 22, 2009, at 12:56 PM, Peter wrote:
> On Thu, Oct 22, 2009 at 10:45 AM, Manu Tamminen <mavata at gmail.com>
> wrote:
>> I have a problem with the Biopython BLAST parser. I'm using the
>> parser to
>> extract relevant information from an XML result file into a tab-
>> separated
>> table. It seems the XML file occasionally contains errors that
>> cause the
>> script to abort. This is especially common and annoying with sequence
>> alignments that contain thousands of sequences.
>>
>> Is it possible to write the script so that when an error occurs,
>> the script
>> would jump into the next sequence rather than abort completely? I
>> will
>> include below an example of such error. This error is about a
>> mismatched tag
>> - sometimes the error has also been about a missing tag.
>>
>> for blast_record in blast_records:
>> File
>> "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/
>> site-packages/Bio/Blast/NCBIXML.py",
>> line 660, in parse
>> expat_parser.Parse(text, True) # End of XML record
>> xml.parsers.expat.ExpatError: mismatched tag: line 82921, column 4
>
> XML is a strict file format with tags like <item> having a closing
> tag </item>. If the XML file is truncated or something, you can
> have mismatched tags (e.g. an <item> without an </item>) which
> means the XML file is invalid. This is basically what that error
> message is about.
>
> I can make some suggestions that may help, but it first are you
> running BLAST locally or online? Are you saving the results to
> a file, or parsing directly from the handle? How many query
> sequences do you have?
>
> Peter
---
Manu Tamminen, M.Sc.
University of Helsinki
Department of Applied Chemistry and Microbiology, Division of
Microbiology
P.O. Box 56
00014 HELSINKI
FINLAND
tel: +358 (0)9191 57585
fax: +358 (0)9191 59322
e-mail: manu.tamminen at helsinki.fi
home: http://www.mm.helsinki.fi/~mvtammin/
More information about the Biopython
mailing list