[Bioperl-l] Batch mode in Bio::DB::GenBank
Hilmar Lapp
hlapp at gmx.net
Fri Mar 31 17:43:15 UTC 2006
There used to be get_Stream_by_batch() which apparently is now
deprecated and forwards to get_Stream_by_id(), which therefore I assume
is supposed to do the Right Thing depending on its arguments. I don't
know where this is going wrong.
-hilmar
On Mar 31, 2006, at 8:56 AM, Chris Fields wrote:
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Marc Logghe
>> Sent: Friday, March 31, 2006 8:45 AM
>> To: bioperl-l at bioperl.org
>> Subject: [Bioperl-l] Batch mode in Bio::DB::GenBank
>>
>> Hi,
>> It seems that in the current (CVS of last night) Bio::DB::GenBank
>> implementation it is not at all possible to set the mode to 'batch'
>> instead of the default 'single'. Devel::StackTrace revealed that the
>> mode is hardcoded in the Bio::DB::WebDBSeqI::get_Stream_by_id method.
>> Is that intended ?
>> The problem is that with single mode, the request is always done with
>> a
>> GET. In most cases (at least in my hands) when you pass a batch of 500
>> id's the request fails because of the url getting too long. All goes
>> well when the method is overridden whereby the mode option is
>> hardcoded
>> to 'batch' so that a POST is done.
>
> You're right about the 500 seq limit. If it's particularly busy
> (during
> peak hours) it's less, around 200-400. I have been grabbing them 400
> at a
> time using a loop, which works but batch would be better.
>
> I remember asking about this a few years ago and, according to
> Lincoln, we
> use the approved batch method retrieval. However, now you point it
> out, I
> just don't see it here (no epost). NCBIHelper has, for some reason,
> this:
>
> %CGILOCATION = (
> 'batch' => ['post' => '/entrez/eutils/efetch.fcgi'],
> 'query' => ['get' => '/entrez/eutils/efetch.fcgi'],
> 'single' => ['get' => '/entrez/eutils/efetch.fcgi'],
> 'version'=> ['get' => '/entrez/eutils/efetch.fcgi'],
> 'gi' => ['get' => '/entrez/eutils/efetch.fcgi'],
> );
>
> Which has batch set to efetch, not epost.
>
>> I think there are at least 2 possibilities:
>> 1) change single to batch in Bio::DB::WebDBSeqI::get_Stream_by_id
>> 2) allow the possibility to pass the mode option when get_Stream_by_id
>> is called using the Bio::DB::GenBank object
>
> I would say the second is the most flexible, though I'm not exactly
> sure why
> we hardcode in 'single' for sequence streams. It may have something
> to do
> with the way single sequences are retrieved; looks like get_Seq_by_acc
> in
> WebDBSeqI calls get_Stream_by_acc with one sequence instead of an
> array ref;
> I guess get_Stream_by_id is the same.
>
> Anyway, I'm for it as long as some tests are added for batch retrieval
> and
> everything passes.
>
>> Any comments/preferences before I actually commit some edits ?
>> Regards,
>> Marc
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------
More information about the Bioperl-l
mailing list