[Bioperl-l] limit on accessing NCBI/GenBank

Lincoln Stein lstein at cshl.org
Wed Feb 12 17:48:50 EST 2003


Actually the code is very clever and will impose the delay even if you have an 
external loop.  At least if you call using a global clever.

However it won't catch you if you spawn multiple processes on your machine.

Lincoln


On Wednesday 12 February 2003 11:12 am, Jason Stajich wrote:
> There is a built in 3sec delay in the code for multiple IDs by default.
>
> But if you instead loop through multiple get_Seq_by_acc each time with a
> single ID it probably won't use the sleep mechanism properly, I'm not sure.
>
> If you read the code for the module Bio::DB::WebDBSeqI you can see where
> this is implemented with a _sleep function by Lincoln.
>
>
>  Title   : new
>  Usage   : $gb = Bio::DB::GenBank->new(@options)
>  Function: Creates a new genbank handle
>  Returns : New genbank handle
>  Args    : -delay   number of seconds to delay between fetches (3s)
>
> NOTE:  There are other options that are used internally.  By NCBI policy,
> this module introduces a 3s delay between fetches.  If you are fetching
> multiple genbank ids, it is a good idea to use get
>
>
> Still you can clearly abuse these modules if you choose to and NCBI can
> cut off your access to their CGI scripts if they feel you are abusing the
> servers.  We always reccommend that people download the data locally and
> use the Bio::Index modules to index the sequence locally if you are doing
> a lot of fetch requests.
>
> Other alternatives to a local flatfile index is the BioSQL project which
> allows you to put the sequence data in your own RBDMS.  Also checkout
> SeqHound (which may be integrated into Bioperl one day) which provides
> additional access to sequence databases which are local or remote.
>
> myGenBank is another way to keep a local copy of genbank for your own uses
> which uses a combination of a RDBMS and flatfile indexfiles.
>
> -jason
>
> On Wed, 12 Feb 2003, Prachi Shah wrote:
> > Hi all!
> >
> > I have this question related to BioPerl but not about
> > its implementation, bugs or problems. I know NCBI has
> > a limit of one request every 3 seconds to their
> > server, that includes all of Entrez and all databases.
> > I had once tried to use LWP useragents directly to
> > access some data from the NCBI website, and thats when
> > I realised that they are very strict about not letting
> > scripts overload their servers. Even if you follow
> > their 3 second rule and you run it at non-peak hours,
> > but if your script makes too many requests it is not
> > acceptable.
> >
> > Does that apply to access made through BioPerl? For
> > example, Bio::DB::GenBank, Bio::DB::Query::GenBank,
> > etc. make queries to NCBI servers straight.
> >
> > thanks,
> > Prachi.
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Shopping - Send Flowers for Valentine's Day
> > http://shopping.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
Lincoln Stein
lstein at cshl.org
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)



More information about the Bioperl-l mailing list