[Biopython-dev] NCBI Abuse activity with Biopython

Thu Jun 26 17:04:26 UTC 2008

Michiel,

I started working on a patch to mark Bio.GenBank.search_for() etc as
deprecated, but on reflection I don't really like the longer code
needed with Bio.Entrez - for example this one liner:

from Bio import GenBank
gi_list = GenBank.search_for("Opuntia AND rpl16")

becomes:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16")
gi_list = Entrez.read(handle)["IdList"]

One idea that might be worth discussing is having variations of the
Entrez.e* functions which will parse the XML and return the results.
i.e. something like this:

def esearch2(...) :
   """Calls ESearch and parses the returned XML."""
   return read(esearch(..., retmode="XML"))

Then we can write,

from Bio import Entrez
gi_list = Entrez.esearch2(db='nucleotide', term="Opuntia AND rpl16")["IdList"]

(An alternative naming convention like a "p" might be nicer)

My initial plan was to get the search results back as plain text
(retmode='uilist'), thus avoiding parsing the XML.  However, after
reading the Entrez documentation, and some experimentation to confirm
this, I was surprised to find the ESearch will only return XML.  The
NCBI appear to suggest that if you want your search results in another
format use the WebEnv session history, and then ask EFetch to reformat
it (!).  This does work, but means making two internet calls:

from Bio import Entrez
handle = Entrez.esearch(db='nucleotide', term="Opuntia AND rpl16",
usehistory="y")
session = Entrez.read(handle)['WebEnv']
gi_list = Entrez.efetch(db='nucleotide', WebEnv=session, query_key=1,
rettype='uilist').read().split('\n')

As an aside, do we really have to include the database in the efetch call above?

Peter