[Bioperl-l] Batch mode in Bio::DB::GenBank

Chris Fields cjfields at uiuc.edu
Fri Mar 31 16:56:12 UTC 2006



> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Marc Logghe
> Sent: Friday, March 31, 2006 8:45 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank
> 
> Hi,
> It seems that in the current (CVS of last night) Bio::DB::GenBank
> implementation it is not at all possible to set the mode to 'batch'
> instead of the default 'single'. Devel::StackTrace revealed that the
> mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method.
> Is that intended ?
> The problem is that with single mode, the request is always done with a
> GET. In most cases (at least in my hands) when you pass a batch of 500
> id's the request fails because of the url getting too long. All goes
> well when the method is overridden whereby the mode option is hardcoded
> to 'batch' so that a POST is done.

You're right about the 500 seq limit.  If it's particularly busy (during
peak hours) it's less, around 200-400.  I have been grabbing them 400 at a
time using a loop, which works but batch would be better.

I remember asking about this a few years ago and, according to Lincoln, we
use the approved batch method retrieval.  However, now you point it out, I
just don't see it here (no epost).  NCBIHelper has, for some reason, this:

    %CGILOCATION = (
		    'batch'  => ['post' => '/entrez/eutils/efetch.fcgi'],
		    'query'  => ['get'  => '/entrez/eutils/efetch.fcgi'],
		    'single' => ['get'  => '/entrez/eutils/efetch.fcgi'],
		    'version'=> ['get'  => '/entrez/eutils/efetch.fcgi'],
		    'gi'   =>   ['get'  => '/entrez/eutils/efetch.fcgi'],
		     );

Which has batch set to efetch, not epost.

> I think there are at least 2 possibilities:
> 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id
> 2) allow the possibility to pass the mode option when get_Stream_by_id
> is called using the Bio::DB::GenBank object

I would say the second is the most flexible, though I'm not exactly sure why
we hardcode in 'single' for sequence streams.  It may have something to do
with the way single sequences are retrieved; looks like get_Seq_by_acc in
WebDBSeqI calls get_Stream_by_acc with one sequence instead of an array ref;
I guess get_Stream_by_id is the same.  

Anyway, I'm for it as long as some tests are added for batch retrieval and
everything passes. 

> Any comments/preferences before I actually commit some edits ?
> Regards,
> Marc
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign




More information about the Bioperl-l mailing list