[Bioperl-l] SeqHound

Wed Feb 6 01:09:56 UTC 2008

Susan,

I'm joining this discussion late so my apologies if I'm missing the  
original point.  If you're trying to routinely download thousands of  
sequences from GenBank or SeqHound you probably want to be using ftp  
to download the flat files and query/parse locally.  If you're trying  
to stay on top of the latest Drosophila ESTs, then how about setting  
up a nightly cron job to download the incremental updates from NCBIs  
ftp (ftp://ftp.ncbi.nih.gov/genbank/daily-nc) and parse that for  
Drosophila EST sequences.  The EST division is huge, but I would  
think nightly incrementals should be manageable.

Barry

On Feb 5, 2008, at 3:31 PM, Susan J. Miller wrote:

> Chris Fields wrote:
>> The URL has changed.  I'll fix this in bioperl-live.
>>
>> You can fix this in your script directly for now (though I hate  
>> globals):
>>
>> use Bio::DB::SeqHound;
>>
>> $Bio::DB::SeqHound::HOSTBASE =
>> 'http://dogboxonline.unleashedinformatics.com/';
>>
>
> Thanks Chris, that helps a little bit, but I'm still not having much
> luck with the SeqHound DB.  The CPAN SeqHound.pm documentation for the
> get_Stream_by_Query method says:
>
> Title   : get_Stream_by_query
>    Usage   : $seq = $db->get_Stream_by_query($query);
>    Function: Retrieves Seq objects from Entrez 'en masse', rather than
> one at a time.  For large numbers of sequences, this is far superior
> than get_Stream_by_[id/acc]().
>    Example : $query_string = 'Candida maltosa 26S ribosomal RNA gene';
>
> However, when I try:
>
> $query_string = 'drosophila simulans[orgn]';
> $query = Bio::DB::Query::GenBank->new(-db=>'nucest',
>                                        -query=>$query_string);
> $stream = $sh->get_Stream_by_query($query);
>
> I get the error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Id list has been truncated even after maxids requested
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::Query::WebQuery::_fetch_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:236
> STACK: Bio::DB::Query::WebQuery::ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:200
> STACK: Bio::DB::SeqHound::get_Stream_by_query
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/SeqHound.pm:314
> STACK: SeqHoundQuery.pl:21
>
> There are only 5013 sequences that match this query so it seems odd  
> that
> the Id list is too long...or am I using SeqHound improperly?
>
> (My reason for trying SeqHound is that I want to set up a monthly cron
> job to download nucest fasta sequences for drosphila melanogaster, and
> I've tried NCBI E-Utilities and the script generated by the NCBI ebot
> and in both cases some of the 570828 records get dropped, even after
> running repeated attempts.)
>
>
> Thanks,
> -susan
>
> Susan J. Miller
> Manager, Scientific Data Analysis
> Biotechnology Computing Facility
> Arizona Research Laboratories
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l