[Biopython-dev] Online access, Bio.PubMed & Bio.GenBank vs Bio.Entrez

Peter biopython at maubp.freeserve.co.uk
Sun Aug 17 14:19:50 UTC 2008


> Also, if you then want to download some or all of these records (say
> as MedLine format files to parse with Bio.Medline), doing this with
> Bio.PubMed.download_many() or the Dictionary class does not take
> advantage of the NCBI's history system (as they encourage).  There are
> similar concerns with the Bio.GenBank.search_for(), download_many()
> and NCBIDictionary classes.

I have just converted Bio.GenBank.search_for() from using Bio.EUtils
to Bio.Entrez, and then afterwards realised I could have copied a lot
of this code from Bio.PubMed.search_for().  However, it was
interesting to see how my code differed.

A few things occured to me after doing this.  Firstly, both these
search_for() functions take a few optional parameters which default to
None, and have to take explicit steps not to pass these None arguments
to Bio.Entrez.esearch() because currently they would wrongly get used
in the URL.  It might make sense to modify Bio.Entrez._open() to skip
None arguments when building the URL.

Secondly, in my testing of the date restriction arguments (reldate,
mindate and maxdate) the URL was constructed correctly, but the
searches returned no hits.  Indeed, there is a comment in the
Bio.PubMed source code (revision 1.4, Jeff Chang in 2003):

    XXX The date parameters don't seem to be working with NCBI's
    script.  Please let me know if you can get it to work.

It looks like I'm not the only one to to find this (I was using the
nucleotide database instead of pubmed).  If someone can confirm this
(e.g. URL testing in a browser) we can ask the NCBI about it.

Thirdly, assuming we don't deprecate it, perhaps
Bio.PubMed.search_for() should just use Bio.Entrez.read() to parse the
XML rather than its own mini-parser?

Finally, perhaps Bio.Entrez neads its own version search_for() which
would parse the XML results into a list of IDs, and download them in
batches.  However, this might be best done as in combination with some
history helper functions to make a combined esearch and efetch easier,
which is a bigger job.

Peter



More information about the Biopython-dev mailing list