[Biopython-dev] Blast

Frank Kauff fkauff at duke.edu
Thu Sep 29 14:17:06 EDT 2005


Hi all,

On Thu, 2005-09-29 at 13:46 -0400, Michiel De Hoon wrote:
> Hi everybody,
> 
> Recently there have been some problems with the Blast parser in Biopython, to
> the degree that the example in 3.1.2 in the tutorial does not work as
> advertised. The problem, of course, is that the NCBI file format as returned
> by a www blast run keeps changing, so we are condemned to keep fixing our
> parser to keep up with NCBI.
> To my surprise, the parser in Blast.NCBIWWW tries to parse HTML output
> instead of text output. My guess is that the HTML output changes more often
> and is more difficult to parse than text output. So isn't it possible to make
> NCBIWWW.qblast return text output instead of HTML and parse that instead?
> So my question is, why was the choice made to parse HTML instead of text? Is
> it simply because blast-on-the-web couldn't return text output in the past?
> 

I'd guess many people still want to really *look* at the output in their
browser, which is just more comfortable with html, not to mention the
possibility of clicking on the links, etc.
I was just looking at the parser a minute ago because I wanted to see if
I can get it to work with blastx output as well. I think one reason that
the parser often chokes is that it is very strictly looking for specific
tags, e.g. <p>, or blank lines, which are often added/removed/changed
between versions, or just change from lowercase to uppercase. Parser
stability might improve if these tags are handled more flexibly (or
ignored as much as possible), and we concentrate more on "stable" parts
of the output?  

Frank

> --Michiel.
> 
> 
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
-- 
Frank Kauff
Dept. of Biology
Duke University
Box 90338
Durham, NC 27708
USA

Phone 919-660-7382
Fax 919-660-7293
Web http://www.lutzonilab.net/members/page225.shtml



More information about the Biopython-dev mailing list