[Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jan 9 16:10:06 UTC 2007


http://bugzilla.open-bio.org/show_bug.cgi?id=2176





------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2007-01-09 11:10 -------
> Regarding the inconsistent use tuples for _hsp.identities, positives, and gaps
> - I would like all the parsers NCBIStandalone and NCBIXML (and ideally the
> HTML parser too) to return identical record objects.

Even after patch #2090, the NCBIStandalone parser is broken for multiple Blast
records, and will probably be broken for single Blast records also when a new
Blast version comes out. I haven't tried the HTML parser, but I'd be surprised
if it can parse HTML output from recent versions of Blast. So whereas I agree
in principle that the three parsers should return identical records objects, in
practice it's hardly relevant given that two of the three parsers either don't
work or cannot work reliably.

> To do this, we could either:
> 
> (a) change NCBIXML to use tuples instead of integers (as suggested by Jacob)

All three of us agree that there's no good reason for tuples. Option (a)
implies copying a bad design choice from a semi-broken parser to a functioning
parser.

> or,
> 
> (b) change NCBIStandalone to use simple integers instead of tuples (is this
> what you meant in comment 3 Michiel?)
> 
> Choice (b) would seem simpler in the long term - but would probably break more
> existing code.  Also, users of NCBIXML are going to have to update their
> scripts anyway after bug 2051, so choice (a) would distrupt less people.

Both option (a) and (b) break existing code. So let me suggest option (c):

(c) Don't do anything.
This doesn't break any code. In the near term, people that use both the
plain-text parser and the XML parser will have to deal with differences in the
Blast record produced by the parser. But how many people are that anyway? Most
likely, not enough to justify option (a). In the long term, assuming that both
the plain-text parser and the HTML parser will be deprecated, there will be no
more inconsistencies.

My question to Jacob:
Why do you need to use the plain-text Blast parser? Is there something it can
do that the XML parser cannot?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list