[Bioperl-l] NCBI efetch: request limits and getting dates fast
Chris Fields
cjfields at illinois.edu
Tue Apr 20 18:57:48 UTC 2010
Not sure about the upper limit with SOAP, but simple ol' EUtilities can take ~250-500 IDs (somewhere in that range) with a direct efetch/esummary/elink, and many many more if you use epost first. I have been able to fetch a couple thousand with epost.
As an example, this code works for me:
use Modern::Perl;
use Bio::DB::EUtilities;
my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'protein',
-email => 'cjfields at bioperl.org',
-term => 'pyrR',
-retmax => 250,
-usehistory => 'y');
my $hist = $eutil->next_History || die "No history returned";
$eutil->set_parameters(-eutil => 'esummary',
-history => $hist);
my %id_map;
while (my $ds = $eutil->next_DocSum) {
my ($cdate) = $ds->get_contents_by_name('CreateDate');
$id_map{$ds->get_id} = $cdate;
}
say join("\t", $_, $id_map{$_}) for sort keys %id_map;
On Apr 20, 2010, at 12:22 PM, Dave Messina wrote:
> Hi everyone,
>
> I've got about 250 NCBI IDs that I'm pulling from NCBI using Bio::DB::SoapEUtilities. It works fine if I send 10 IDs at a time, but much more than that and I get an 'unspecified internal server error'.
>
> I thought the limit with 500 IDs at a time — anyone have an idea whether that's true?
>
>
> And a separate, related question:
>
> All I really want to get is the last-modified date for these records.
>
> And it's kinda slow.
>
> Using some code from the EUtilities_Web_Services HOWTO, I use the seq Fetch adaptor and the add_wanted_slot() Bio::Seq::SeqBuilder trick to get just the annotation part of a RichSeq object, and from there I pull out the dates using
>
> $seq->annotation->get_Annotations('date_changed')
>
>
> Can someone suggest a faster way?
>
>
> Thanks,
> Dave
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list