[Bioperl-l] Fetching > 500 sequences

henrik nilsson rnilsson at clarku.edu
Fri Mar 26 10:55:22 EST 2004


Hi,
	Thank you very much for your help.  I went through and took out any
references to "retmax" in both Bio::DB::Query::GenBank and Bio::DB::GenBank
and Bio::DB::NCBIHelper.  Our script first sends a query through
Bio::DB::Query::GenBank and that works fine (it only returns the count found,
and that count is > 7000).  However, we then actually query GenBank with
Bio::DB::GenBank and it only returns 500 despite the fact it should be
returning 7000+ (and we really want them all).  I compared both of the
scripts which look very similar in what was passed to GenBank.

	I was wondering if any other variables would cause the Entrez scripts to
 only return 500?  We do use mindate/maxdate and we fetch by query string (if
 that matters).  Any other ideas?

Thanks again for all of your help.

Rolf

> > > It seems that I have problems with fetching more than 500 sequences
> > > from Genbank using Bioperl. It looks like the script (attached below)
> > > fetches all the 7000+ sequences, but only 500 make it to the output
> > > file. Is there any way to get all these 7000+ sequences written to the
> > > file - that is, is it possible to sidestep the 500 seq. limit?
>
> I actually debugged and fixed this problem recently for Biopython --
> it looks like a change in the way EUtils works. If you pass 'retmax'
> to the eutils URL then it will only give you back at max 500
> sequences, no matter what you pass for this parameter. The fix I
> found that worked was to not pass 'retmax'.
>
> The attached patch to Bio/DB/Query/GenBank.pm should fix the
> problem, if similar symptoms equal similar fixes in this case. An
> actual Perl/BioPerl person should look at this, though, as I'm not
> to be trusted for coding Perl :-).
>
> Hope this helps.
> Brad

-------------------------------------------------------



More information about the Bioperl-l mailing list