[Bioperl-l] Downloading refseq genomes in batch

Fields, Christopher J cjfields at illinois.edu
Tue Apr 3 21:19:07 UTC 2012


500 sequences isn't too bad for a remote lookup (I have run about ~20K myself).  It's much easier if you can grab them as a batch, e.g. run esearch for the IDs, use efetch with the webenv/key to grab the sequences.  NCBI is more worried about the number of requests made, the length of time between requests, and the time of day requests are made.

In fact, I recall updating EUtilities recently so it can use a POST, so you can grab ~2000 seqs at a time w/o having to iterate through them.

chris

On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote:

> 
> Hi Shalab
> You can try use Bio::DB::GenBank, but I believe the NCBI does not like people doing many remote lookups. I would advise you download the whole database you are interested in, and then you parse it locally.
> Cheers, Juan
>> Date: Tue, 3 Apr 2012 14:15:16 -0400
>> From: shalabh.sharma7 at gmail.com
>> To: carandraug+dev at gmail.com
>> CC: Bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch
>> 
>> Hi Came,
>>              Thanks for your reply.
>> I tried to get UID from genome names but i cant find on EUtilities.
>> I have taxa id for those genomes, can i download genomes with taxa id in
>> batch ?
>> 
>> Thanks
>> Shalabh
>> 
>> 
>> On Tue, Apr 3, 2012 at 11:53 AM, Carnë Draug <carandraug+dev at gmail.com>wrote:
>> 
>>> On 3 April 2012 16:34, shalabh sharma <shalabh.sharma7 at gmail.com> wrote:
>>>> Hi All,
>>>>        I am trying to download refseq genomes in batch. But instead of
>>>> accession number i have genome names (=~ 500).
>>>> Is there any way i can download them using some bioperl module ?
>>> 
>>> If you have their name/official symbol, then searching on the database
>>> should nly return one hit, therefore one UID. Make the search, get
>>> that number, and use it for download. The EUtilities module should do
>>> that.
>>> 
>>> Carnë
>>> 
>> 
>> 
>> 
>> -- 
>> Shalabh Sharma
>> Scientific Computing Professional Associate (Bioinformatics Specialist)
>> Department of Marine Sciences
>> University of Georgia
>> Athens, GA 30602-3636
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 		 	   		  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list