[Biopython] Bio.Entrez/Medline DTD problems - missing DTD nlmmedlinecitationset_100301.dtd

Guy Eakin guyeakin at gmail.com
Thu Jul 8 11:28:17 UTC 2010


Peter,

Many thanks.

this is a query statement that generated the
nlmmedlinecitationset_100301.dtd error: Entrez.esearch(db="pubmed",
                        term= ('glaucom*'),
                        retmax=2, usehistory="y",
                        reldate=7, datetype="edat")


fetch_handle = Entrez.efetch(db="pubmed", retmode="xml",rettype='medline',
                             webenv=webenv, query_key=query_key)


You will also want to add pubmed_100301.dtd to your repository.  I do not
have the query that generated it's dependent XML, but got an separate error
related to its absence yesterday.  Oddly, I was able to download the
"hidden" pubmed_100301.dtd, but could not replicate the error.   All
following errors focused on the nlmmedlinecitationset_100301.dtd file which
I could not locate until this morning. Perhaps it was just recently posted
to the site. Either way, thanks for the confirmation that I was on the right
track.

regards,
guy

On Thu, Jul 8, 2010 at 3:42 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Thu, Jul 8, 2010 at 1:52 AM, Guy Eakin <guyeakin at gmail.com> wrote:
> >  I am learning biopython and seem to be having trouble parsing efetch
> > generated xml.
> >
> > Maybe I am confused here, but I can't for the life of me Get my xml to
> parse
> > correctly, and it seems to be coming up with a missing dtd error using
> both
> > Medline.parse and Entrez.parse. (traceback for medline below below)
> >
> >  nlmmedlinecitationset_100301.
> > dtd and pubmed_100301.dtd seem to be missing from my biopython
> > installation, and unavailable from the following NCBI sites:
> >
> > http://www.ncbi.nlm.nih.gov/dtd/ or
> > http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/
> >
> > My apologies if this is user error; i do not see reference to this DTD
> issue
> > in the archives so am posting the incident. Is this just bad luck during
> my
> > learning curve, or am I missing something conceptual here?
>
> The problem is with the NCBI "hiding" the file by not showing the raw
> contents of that folder, but just an HTML page with a partial list. You
> need this file:
>
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_100301.dtd
>
> I've added this to our repository so the next version of Biopython will
> include it. Please let us know if anything else is missing - what was
> the Entrez request you used to get the XML using this DTD file?
>
> Regards,
>
> Peter
>



More information about the Biopython mailing list