[Biopython-dev] Parsing efetch results from the Journals database through Bio.Entrez

Peter biopython at maubp.freeserve.co.uk
Fri Sep 3 17:31:09 UTC 2010


On Fri, Sep 3, 2010 at 6:26 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> The parser in Bio.Entrez can parse any XML returned by the Entrez E-utilities
> as long as the corresponding DTD is available (which are included with each
> release of Biopython). One corner case is efetch results from the Journals
> database. Officially, efetch from the Journals database does not generate
> output in the XML format, but only plain text or HTML. However, when
> requesting XML explicitly from Entrez, in practice it does return an XML-like
> output. Our parser in Bio.Entrez is able to parse this XML, but it requires
> several hacks in the parser code.

Out of interest, have you asked the NCBI about this undocumented XML output?

> As probably few users are interested in efetch output from the Journals
> database, I suggest that we remove these hacks from Bio.Entrez altogether
> -- after all, this is for XML that is not supported by NCBI to begin with. If
> there are some users that really want to parse efetch output from the
> Journals database, we can always add a simple parser for plain-text
> efetch output.
>
> The advantage of removing these hacks is that it will allow us to validate
> all XML against the DTD, and to raise an error (if the user requests so)
> if any elements are found in the XML that don't validate against the DTD.
>
> Any objections?

Is it feasible to just put deprecation warnings in for Biopython 1.56,
and then remove the hacks later?

Peter



More information about the Biopython-dev mailing list