[Biopython-dev] Blast

Sat Oct 1 20:50:03 EDT 2005

Thanks, Jeff. Currently, qblast in Bio.Blast.NCBIWWW can already return text
output via the format_type argument. Unfortunately, the standalone blast and
www-blast return slightly different text output, so we'd have to fix the
parser in Bio.Blast.NCBIStandalone for it to handle www-blast text output.

I found out that both standalone blast and www-blast can also return XML
output, which is identical (as far as I can tell) in both cases. I would
think that a parser that can read this XML output is most stable.
So I propose the following:

1) Let qblast return XML output by default; text and html output can be
returned by setting the format_type argument to qblast appropriately.
2) Write an XML parser that can read blast output from standalone and www
blast.
3) In a few versions, deprecate the text parser in NCBIStandalone and the
html parser in NCBIWWW. (This will only affect users of the text parser in
NCBIStandalone, since the html parser in NCBIWWW is already behind and cannot
parse blast output as it is).

Any objections, anybody?

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032

-----Original Message-----
From: Jeffrey Chang [mailto:jeffrey.chang at duke.edu]
Sent: Thu 9/29/2005 10:16 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Blast

On Sep 29, 2005, at 1:46 PM, Michiel De Hoon wrote:

> To my surprise, the parser in Blast.NCBIWWW tries to parse HTML output
> instead of text output. My guess is that the HTML output changes  
> more often
> and is more difficult to parse than text output. So isn't it  
> possible to make
> NCBIWWW.qblast return text output instead of HTML and parse that  
> instead?
> So my question is, why was the choice made to parse HTML instead of  
> text? Is
> it simply because blast-on-the-web couldn't return text output in  
> the past?

You are right.  It was done that way in the past when the only way to  
use NCBI's BLAST was to use the HTML output.  (Actually, there was a  
version that you could access through a proprietary non-HTTP  
protocol, but the databases were not updated as frequently.)  Now  
that we can get text, perhaps it is time to encourage users to use  
the text one.  I believe the HTML parser is a few versions behind  
now, and unable to parse current BLAST output anymore.

Jeff