[Bioperl-l] SeqHound

Wed Feb 6 21:48:45 UTC 2008

On Feb 6, 2008, at 2:57 PM, Susan J. Miller wrote:

> Barry Moore wrote:
>> Susan,
>> I'm joining this discussion late so my apologies if I'm missing the  
>> original point.  If you're trying to routinely download thousands  
>> of sequences from GenBank or SeqHound you probably want to be using  
>> ftp to download the flat files and query/parse locally.  If you're  
>> trying to stay on top of the latest Drosophila ESTs, then how about  
>> setting up a nightly cron job to download the incremental updates  
>> from NCBIs ftp (ftp://ftp.ncbi.nih.gov/genbank/daily-nc) and parse  
>> that for Drosophila EST sequences.  The EST division is huge, but I  
>> would think nightly incrementals should be manageable.
>
>
> Hi Barry,
>
> I'll try your suggestion.  I guess my interpretation of the  
> documentation for SeqHound was erroneous.  (Who knows what 'large  
> numbers of sequences' means?)  I tried using SeqHound's  
> get_Stream_by_id method to fetch 10000 sequences, 500 at a time, and  
> got a timeout error.
>
>
> -- 
> Regards,
> -susan

Barry's and Brian's suggestions make more sense.  You could also  
possibly automate a Entrez query to limit retrievals to a period of  
time instead of munging through the last releases; it all depends on  
how many sequences you need to parse through.

The SeqHound timeout may be set up on their end to prevent a single  
server from spamming them with tons of requests.  NCBI is a bit more  
tolerant but can be brittle with busy server traffic.

chris