[Biopython] Entrez.esearch sort by publication date
Renato Alves
rjalves at igc.gulbenkian.pt
Mon Jun 1 17:49:47 UTC 2009
Quoting Peter on 06/01/2009 11:30 AM:
> On Sun, May 31, 2009 at 6:16 PM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
>> Hi everyone,
>>
>> I've been using Entrez.esearch for a while without problems but today I
>> wanted to have the results sorted by publication date.
>>
>> According to the docs at:
>> http://www.ncbi.nlm.nih.gov/corehtml/query/static/esearch_help.html#Sort
>> I should use 'pub+date', however this doesn't work. If I use 'author'
>> and 'journal' I have no problems but if I use 'last+author' or
>> 'pub+date' I get an empty reply:
>>
>>>>> Entrez.esearch(db='pubmed', term=search, retmax=5,
>> sort='pub+date').read()
>> <?xml version="1.0" ?>\n<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD
>> eSearchResult, 11 May 2002//EN"
>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">\n<eSearchResult><Count/><RetMax/><RetStart/><TranslationSet/><QueryTranslation/></eSearchResult>\n'
>>
>> Any suggestions on how to make this work?
>
> The NCBI documentation for "sort" says "Use in conjunction with Web
> Environment to display sorted results in ESummary and EFetch.", and in
> the example above you are not using the Web Environment (history)
> mode.
>
> i.e. I think you need to do an ESearch with history="Y" and
> sort="pub+date", then an EFetch which will be in date order.
>
> If you get this working, perhaps you could share a complete example?
> It would make a nice cookbook entry for the wiki.
>
> Peter
Hi again Peter,
After further testing I came to the conclusion that this is a problem of
character escaping. The '+' sign in the 'pub+date' statement is
converted to '%2B' giving wrong results. Since ' ' is escaped to '+'
then the correct syntax would be 'pub date' instead of 'pub+date'.
A working example would be: (Feel free to add it to the cookbook)
#! /usr/bin/env python
from Bio import Entrez, Medline
from datetime import datetime
# Make sure you change this to your email
Entrez.email = 'somemail at somehost.domain'
def fetch(t, s):
h = Entrez.esearch(db='pubmed', term=t, retmax=5, sort=s)
idList = Entrez.read(h)['IdList']
if idList:
handle = Entrez.efetch(db='pubmed', id=idList,
rettype='medline', retmode='text')
records = Medline.parse(handle)
for record in records:
title = record['TI']
author = ', '.join(record['AU'])
source = record['SO']
pub_date = datetime.strptime(record['DA'], '%Y%m%d').date()
pmid = record['PMID']
print("Title: %s\nAuthor(s): %s\nSource: %s\n"\
"Publication Date: %s\nPMID: %s\n" % (title, author,
source, pub_date, pmid))
print('-- Sort by publication date --\n')
fetch('Dmel wings', 'pub date')
print('-- Sort by first author --\n')
fetch('Dmel wings', 'author')
# EOF
--
Renato
More information about the Biopython
mailing list