[Biopython-announce] is this supposed to be really slow?

W. Bryan Smith wbsmith at gmail.com
Fri May 25 22:31:38 UTC 2007


hi there,

i just started using biopython today and was going through the example
on pages 31 and 32 of the tutorial: "Sending a query to Pubmed" and
"Retrieving a PubMed record" and i think i am confused as to how i am
supposed to be doing something.  as an additional bonus, i am new to
python, and so may just be making a stupid python mistake.

anyway, what i am basically trying to do is to get an array containing the
year of publication for all the publications that match some keyword.  from
the tutorial, i am doing something like this:

#begin code snippet
from Bio import PubMed, Medline
import string

searchTerm = 'mySearchTerm'
termIds = PubMed.search_for( searchTerm )

recParser = Medline.RecordParser()
medlineDict = PubMed.Dictionary( parser = recParser )

pubDates = numpy.zeros( ( len( termIds ) ), numpy.uint16 )
idx = 0

for idx in range( len( termIds ) ):
    pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ]
].publication_date[ 0:4 ] )
    idx = idx + 1


#end code snippet

so this seems to be working, but it seems to be very slow.  well, either
it's
slow, or i don't understand the complexity of what it is doing.  i have
attempted to time this process, and it is taking about 7 seconds per record
to retrieve the date and drop it into my numpy array.  is this because this
code is fetching something from the internet and that is what is taking such
a long time?  or is there some other explanation for why this is slow (i.e.
my
terrible, non-pythonic code writing, what it is doing is actually very
complex
and i just don't get it, etc)?!?

any insight into this would be much appreciated.

thanks,
bryan



More information about the Biopython-announce mailing list