[Biopython-dev] [Bug 2051] XML Blast parser unusable with multiple queries and recent (2.2.13) blast - patch attached

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Dec 4 19:18:39 UTC 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=2051





------- Comment #6 from kael.fischer at gmail.com  2006-12-04 14:18 -------
This is untested for blastall output versions other than 2.2.14-15 and I have
only looked at blastn.

XMLParser: 1 Blast Record instance = all submitted query sequences
Traditional BlastParser: 1 Blast Record instance = 1 query sequence 
(for versions of BlastParser/blastall where it can parse)

The name of all the queries (after the first one) and their lengths is lost
during parsing.  The data are in the XML output at the top level of each
<iteration>. For the data structure to be isomorphus to the original
BlastParser and capture this important information, NCBIXML.parser should
return a list of records (one per XML <iteration>).  Also, having some sort of
iterator/generator mechanism for the <iteration>s would have the added benefit
of a smaller memory footprint for very large results.

It has been suggested that XMLParser be used in lieu of BlastParser, as
BlastParser is broken for new-ish versions of blastall (see bug 2090).  All
code that uses record.query or record.query_letters, or in some other way
relies on the documented
(http://www.bioinformatics.org/bradstuff/bp/tut/images/BlastRecord.png) data
structure of 1 record per query is broken when using NCBIXML because of this
behavior.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list