[Biopython-announce] is this supposed to be really slow?

Sat May 26 02:08:42 UTC 2007

On 5/25/07, Titus Brown <titus at caltech.edu> wrote:
>
>
> Hi, Bryan,
>
> I'm not too familiar with the underlying code, but I believe that
> BioPython enforces a three second wait between record retrieval attempts
> from NCBI.  This is by request of NCBI; see
>
>         http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html

i did see this constraint for only one request per 3 seconds, but did
not realize each time i went through my loop that this was a separate
request.  you're probably correct, that this is (partly) the source of my
slow code.  i guess i really didn't understand the nature of how this
piece of code was working... i thought that the text data were dropped
to memory when i called the PubMed.Dictionary function, so i was
thinking that was the one request/3 seconds i had to worry about.  i'm
sure traffic for these sorts of things can get pretty high, but it does
seem to be a bit ridiculous that if i want to retrieve 50 records, it will
take a minimum of 2.5 minutes to do so.  each record must be only
about 10 KB or so (in xml format), so it seems a little ridiculous that i
can only pull ~3 KB/s from the ncbi servers.  can anyone verify that
this is the case?  is there anything to do about this constraint?

I personally tend to just use the NCBI retrieval URLs directly, but
> that's kind of ugly.

you mean you just use the pubmed ids and then pull down the text of
the corresponding url to process separately?  not sure i understand if
that is what you mean or not, but i don't really know how to parse and
process text in python.  maybe this is a good opportunity to learn. :)
all i really want is a way to count publications per year for some key
word... at least that is all i am trying to accomplish right now.  seems
like there should be an easy and relatively fast way to do this.

There may be a higher volume retrieval system
> built directly into BioPython, too.

any experts out there care to weigh in on this?

thanks so much for the input,
bryan