[Bioperl-l] NCBI eutils
Lincoln Stein
lstein@cshl.org
Sun, 01 Dec 2002 15:19:49 -0500
Hi Jim,
We're very nearly ready to release a version of bioperl that uses the NCBI
eutils (esearch, epost and efetch) and enforces the 3s delay that you folks
recommend. The only problem is that in recent days equery/efetch have become
very unreliable. The symptom is this:
1) formulate a query and submit it to equery.
2) recover the WebEnv and QueryKey fields
3) use these fields in a request to efetch
4) efetch returns an OK code, but empty content
Here is an example:
GET http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&
mindate=1%2F1%2F2002&maxdate=1%2F30%2F2002&datetype=mdat&
usehistory=y&tool=bioperl&term=Onchocerca+volvulus[Organism]&retmax=73
The response looks OK and returns:
<?xml version="1.0"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult>
<Count>731</Count>
<RetMax>100</RetMax>
<RetStart>0</RetStart>
<QueryKey>1</QueryKey>
<WebEnv>M%40uXFBO%5DBt%5E%5ECD_AN%3C%3C%3CcBD%3F%60%5C%3C%40%3Ch%40%5CHfC%3F%3Do%3C%3EE%3FjBAeI%3EKF%5E%3CD</WebEnv>
<IdList>
<Id>5835443</Id>
<Id>12005979</Id>
<Id>12005977</Id>
...
</IdList>
<TranslationSet>
<Translation>
<From>Onchocerca+volvulus%5BOrganism%5D</From>
<To>%22Onchocerca+volvulus%22%5BOrganism%5D</To>
</Translation>
</TranslationSet>
<TranslationStack>
<TermSet>
<Term>"Onchocerca volvulus"[Organism]</Term>
<Field>Organism</Field>
<Count>15432</Count>
<Explode>Y</Explode>
</TermSet>
<TermSet>
<Term>1/1/2002[mdat]</Term>
<Field>mdat</Field>
<Count>-1</Count>
<Explode>Y</Explode>
</TermSet>
<TermSet>
<Term>1/30/2002[mdat]</Term>
<Field>mdat</Field>
<Count>-1</Count>
<Explode>Y</Explode>
</TermSet>
<OP>RANGE</OP>
<OP>AND</OP>
</TranslationStack>
</eSearchResult>
Now when I run the fetch on the indicated WebEnv, I get
GET
'http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=gb&db=nucleotide&query_key=1&
tool=bioperl&retmode=text&
WebEnv=Oc[%40A%5ECE%5C]B_EJ%3CIiF_B%5CF%40dGheCJGYkka%40A]Kj%3F%5ECJIkJ%3ELKAa%3C%3D&
usehistory=y'
Connection: close
Date: Sun, 01 Dec 2002 20:18:38 GMT
Via: 1.1 www.ncbi.nih.gov
Server: Apache
Content-Type: text/plain
Client-Date: Sun, 01 Dec 2002 20:18:38 GMT
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Set-Cookie: WebEnv=p=KP_FeGB>ffADfECcNbIIcYchi^FIzb@G?`FkDCAj=DIcD>KFd\E;
domain=.nlm.nih.gov; path=/; expires=Sun, 01-Dec-2002 21:18:41 GMT
X-Cache: MISS from www.ncbi.nih.gov
There is supposed to be some content following the headers, but it looks to me
as though the NCBI server crashed.
Note that this isn't always the case. About one time out of five it works.
Last week, this was working 100% of the time, but it started to get flaky
over thanksgiving.
Lincoln
--
Lincoln Stein
lstein@cshl.org