[Biopython-dev] Online access, Bio.PubMed & Bio.GenBank vs Bio.Entrez

Mon Aug 18 23:46:29 UTC 2008

--- On Sun, 8/17/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Thirdly, assuming we don't deprecate it, perhaps
> Bio.PubMed.search_for() should just use Bio.Entrez.read()
> to parse the
> XML rather than its own mini-parser?
Now that Bio.Entrez is available, the mini-parser in Bio.PubMed is no longer needed.

> Finally, perhaps Bio.Entrez neads its own version
> search_for() which
> would parse the XML results into a list of IDs, and
> download them in
> batches.  However, this might be best done as in
> combination with some
> history helper functions to make a combined esearch and
> efetch easier,
> which is a bigger job.

It is not entirely clear to me if a search_for function (in Bio.PubMed, Bio.GenBank, or Bio.Entrez) is a good idea. The search_for function provides a higher-level interface to the low-level functionality in Entrez. But there is a reason that Entrez only provides low-level functions: it cannot provide higher-level functions without knowing what the user wants. We as biopython don't know much more han Entrez (except that they'll want to parse the result using Python).
Maybe I'm being too pessimistic, but I think the result will be either an over-engineered function that tries to cater to all possible user wishes, or a more straightforward function that is useful only for a minority of users.

--Michiel.