[Biopython-dev] Python 3 and encoding for online resources

Peter biopython at maubp.freeserve.co.uk
Tue Aug 3 14:07:40 UTC 2010


Peter wrote:
>Michiel wrote:
>> So I would suggest to switch from urllib to urllib2 in Bio.Entrez and catch
>> any HTTP errors (urllib2 is translated appropriately by 2to3),
>
> That sounds very sensible.
>

Hi Michiel,

I see you've switched from urllib to urllib2, but you also removed all
the NCBI specific error handling (which it turns out would need to be
updated).

I just tried a simple history example and if you deliberately use a
wrong webenv you get an HTML error page back (from memory
and the comments in our code it used to be a plain text error page):

<html>
<body>
<br/><h2>Error occurred: Unable to obtain query #1</h2><br/><ul
title="some params from request:">
<li>db=pubmed</li>
<li>query_key=1</li>
<li>report=medline</li>
<li>dispstart=0</li>
<li>dispmax=10</li>
<li>mode=text</li>
<li>WebEnv=wrong</li>
</ul>
<br/><b>pmfetch need params:</b><br/><br/>
<li>(id=NNNNNN[,NNNN,etc]) or (query_key=NNN, where NNN - number in
the history, 0 - clipboard content for current database)</li>
<li>db=db_name (mandatory)</li>
<li>report=[docsum, brief, abstract, citation, medline, asn.1, mlasn1,
uilist, sgml, gen] (Optional; default is asn.1)</li>
<li>mode=[html, file, text, asn.1, xml] (Optional; default is html)</li>
<li>dispstart - first element to display, from 0 to count - 1,
(Optional; default is 0)</li>
<li>dispmax - number of items to display (Optional; default is all
elements, from dispstart)</li>
<br/>See <a href="http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html">help</a>.</body>
</html>

The old code could handle this just by looking for "Error occurred".

Anyway, this demonstrates that we can't just assume any error will
be handled by the NCBI as an HTTP error code and thus get
turned into an exception automatically by urllib2. In this particular
case, one might argue the NCBI should use HTTP status code
400 Bad Request.

I think we should write some online tests for Bio.Entrez
including error conditions like this.

In a related example, I'm trying added a sleep statement between
my ESearch and EFetch calls in order let the session time out.
I'll post back once I know what it does - but I'll be pleasantly
surprised if they do something like HTTP status code 410 Gone,
I'm expecting another HTML error page.

Regards,

Peter



More information about the Biopython-dev mailing list