[Bioperl-l] SeqHound
Barry Moore
barry.moore at genetics.utah.edu
Wed Feb 6 01:09:56 UTC 2008
Susan,
I'm joining this discussion late so my apologies if I'm missing the
original point. If you're trying to routinely download thousands of
sequences from GenBank or SeqHound you probably want to be using ftp
to download the flat files and query/parse locally. If you're trying
to stay on top of the latest Drosophila ESTs, then how about setting
up a nightly cron job to download the incremental updates from NCBIs
ftp (ftp://ftp.ncbi.nih.gov/genbank/daily-nc) and parse that for
Drosophila EST sequences. The EST division is huge, but I would
think nightly incrementals should be manageable.
Barry
On Feb 5, 2008, at 3:31 PM, Susan J. Miller wrote:
> Chris Fields wrote:
>> The URL has changed. I'll fix this in bioperl-live.
>>
>> You can fix this in your script directly for now (though I hate
>> globals):
>>
>> use Bio::DB::SeqHound;
>>
>> $Bio::DB::SeqHound::HOSTBASE =
>> 'http://dogboxonline.unleashedinformatics.com/';
>>
>
> Thanks Chris, that helps a little bit, but I'm still not having much
> luck with the SeqHound DB. The CPAN SeqHound.pm documentation for the
> get_Stream_by_Query method says:
>
> Title : get_Stream_by_query
> Usage : $seq = $db->get_Stream_by_query($query);
> Function: Retrieves Seq objects from Entrez 'en masse', rather than
> one at a time. For large numbers of sequences, this is far superior
> than get_Stream_by_[id/acc]().
> Example : $query_string = 'Candida maltosa 26S ribosomal RNA gene';
>
> However, when I try:
>
> $query_string = 'drosophila simulans[orgn]';
> $query = Bio::DB::Query::GenBank->new(-db=>'nucest',
> -query=>$query_string);
> $stream = $sh->get_Stream_by_query($query);
>
> I get the error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Id list has been truncated even after maxids requested
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::Query::WebQuery::_fetch_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:236
> STACK: Bio::DB::Query::WebQuery::ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:200
> STACK: Bio::DB::SeqHound::get_Stream_by_query
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/SeqHound.pm:314
> STACK: SeqHoundQuery.pl:21
>
> There are only 5013 sequences that match this query so it seems odd
> that
> the Id list is too long...or am I using SeqHound improperly?
>
> (My reason for trying SeqHound is that I want to set up a monthly cron
> job to download nucest fasta sequences for drosphila melanogaster, and
> I've tried NCBI E-Utilities and the script generated by the NCBI ebot
> and in both cases some of the 570828 records get dropped, even after
> running repeated attempts.)
>
>
> Thanks,
> -susan
>
> Susan J. Miller
> Manager, Scientific Data Analysis
> Biotechnology Computing Facility
> Arizona Research Laboratories
> (520) 626-2597
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list