[Bioperl-l] NCBI GenBank web retrieval
Jason Stajich
jason@cgt.mc.duke.edu
Sat, 19 Jan 2002 17:48:53 -0500 (EST)
[jason having learned way too much about how to reverse engineer CGI]
I've restored the functionality from previous versions of DB::GenBank and
DB::GenPept as we are using the new NCBI cgi /htbin-post/Entrez/query.
I was able to figure out that terms are encoded as being separated by '+'
instead of the previous ',' which had been causing only one sequence to
be retrieved. Additionally I fixed a bug that retrieved the last rather
than the first sequence for a request that has multiple hits and use
get_Seq_by_(id|acc)
I was unable to reactivate access to Batch entrez through
/entrez/batchentrez.cgi as that only seems to return an HTML table and I
am trying to avoid the 2-step query process at this time. I attempted to
mimic Lincoln's functionality in Boulder::Genbank here, but alas it
appears that the previous /cgi-bin/Entrez/qserver.cgi/result is disabled.
Lincoln - I believe this breaks Boulder 1.24 Entrez access as well. I
guess we can go to a 2-step retrieval by parsing HTML if people are
interested.
Are there limits to size of URLs ? I thought there might be which could
be a problem since the requests are sent as GETs not POSTs. Otherwise we
basically have batch entrez functionality back in.
(Roger this is essentially the fix we talked about - as best as I can
solve it so you can take it off your queue unless you've got ideas)
-jason
--
Jason Stajich
Duke University
jason@cgt.mc.duke.edu