[Biopython] Entrez.parse error
Konrad Koehler
konrad.koehler at mac.com
Tue Dec 20 04:43:38 UTC 2016
Then how does one parse the output? Entrez.parse used to work, but no longer. Apparently NCBI has made changes to their xml that has broken Entrez.parse. Entrez.read returns a complex data structure that is difficult to parse.
If one adds "['PubmedArticle']" to line 302 of /Bio/Entrez/Parse.py so that it reads:
records = self.stack[0]['PubmedArticle']
this suppresses the error message, but it mysteriously returns only the strings "PubmedArticle" and "PubmedBookArticle" and not the citation. Any ideas?
Konrad
> On 20 Dec 2016, at 05:16, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Entrez.read works for me for the example shown.
>
> Best,
> -Michiel
>
>
> On Sunday, December 18, 2016 11:57 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>
> On Sun, Dec 18, 2016 at 2:50 AM, Peter Cock <p.j.a.cock at googlemail.com <mailto:p.j.a.cock at googlemail.com>> wrote:
> > On Thu, Dec 15, 2016 at 7:37 PM, Konrad Koehler <konrad.koehler at mac.com <mailto:konrad.koehler at mac.com>> wrote:
> >> Hello everyone,
> >>
> >> I have been using Entrez.parse for years without any errors. However just
> >> in the last day or two, it stopped working. I have been able to reproduce
> >> the error using the following example from the biopython Package Entrez
> >> documentation:
> >>
> >
> > I can reproduce this. The XML looks sensible, two <PubmedArticle>
> > tags:
> >
> > <?xml version="1.0" ?>
> > <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
> > January 2017//EN"
> > "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd <https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd>">
> > <PubmedArticleSet>
> > <PubmedArticle>
> > <MedlineCitation Status="MEDLINE" Owner="NLM">
> > <PMID Version="1">19304878</PMID>
> > ...
> > </MedlineCitation>
> > <PubmedData>
> > ...
> > </PubmedData>
> > </PubmedArticle>
> > <PubmedArticle>
> > <MedlineCitation Status="MEDLINE" Owner="NLM">
> > <PMID Version="1">14630660</PMID>
> > ...
> > </MedlineCitation>
> > <PubmedData>
> > ...
> > </PubmedData>
> > </PubmedArticle>
> > </PubmedArticleSet>
> >
> > Note however it is using a new DTD file for Jan 2017,
> >
> > https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd <https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd>
> >
> >
> >> Does anyone have any suggestions on how to get Entrez.parse working again? I
> >> am also curious why this stopped working. Has the NCBI server changed?
> >>
> >
> > I would guess that the NCBI changed something subtly. Michiel?
> >
> > Peter
>
> Logged on GitHub,
>
> https://github.com/biopython/biopython/issues/1027 <https://github.com/biopython/biopython/issues/1027>
>
>
> Peter
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20161220/9704bf9a/attachment.html>
More information about the Biopython
mailing list