[Bioperl-l] SeqHound

Susan J. Miller sjmiller at email.arizona.edu
Tue Feb 5 22:31:27 UTC 2008


Chris Fields wrote:
> The URL has changed.  I'll fix this in bioperl-live.
> 
> You can fix this in your script directly for now (though I hate globals):
> 
> use Bio::DB::SeqHound;
> 
> $Bio::DB::SeqHound::HOSTBASE = 
> 'http://dogboxonline.unleashedinformatics.com/';
> 

Thanks Chris, that helps a little bit, but I'm still not having much 
luck with the SeqHound DB.  The CPAN SeqHound.pm documentation for the 
get_Stream_by_Query method says:

Title   : get_Stream_by_query
   Usage   : $seq = $db->get_Stream_by_query($query);
   Function: Retrieves Seq objects from Entrez 'en masse', rather than 
one at a time.  For large numbers of sequences, this is far superior 
than get_Stream_by_[id/acc]().
   Example : $query_string = 'Candida maltosa 26S ribosomal RNA gene';

However, when I try:

$query_string = 'drosophila simulans[orgn]';
$query = Bio::DB::Query::GenBank->new(-db=>'nucest',
                                       -query=>$query_string);
$stream = $sh->get_Stream_by_query($query);

I get the error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids 
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids 
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/Query/WebQuery.pm:200
STACK: Bio::DB::SeqHound::get_Stream_by_query 
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/SeqHound.pm:314
STACK: SeqHoundQuery.pl:21

There are only 5013 sequences that match this query so it seems odd that 
the Id list is too long...or am I using SeqHound improperly?

(My reason for trying SeqHound is that I want to set up a monthly cron 
job to download nucest fasta sequences for drosphila melanogaster, and 
I've tried NCBI E-Utilities and the script generated by the NCBI ebot 
and in both cases some of the 570828 records get dropped, even after 
running repeated attempts.)


Thanks,
-susan

Susan J. Miller
Manager, Scientific Data Analysis
Biotechnology Computing Facility
Arizona Research Laboratories
(520) 626-2597



More information about the Bioperl-l mailing list