[Biopython] Bio.Entrez/Medline DTD problems - missing DTD nlmmedlinecitationset_100301.dtd

Peter biopython at maubp.freeserve.co.uk
Thu Jul 8 07:42:32 UTC 2010


On Thu, Jul 8, 2010 at 1:52 AM, Guy Eakin <guyeakin at gmail.com> wrote:
>  I am learning biopython and seem to be having trouble parsing efetch
> generated xml.
>
> Maybe I am confused here, but I can't for the life of me Get my xml to parse
> correctly, and it seems to be coming up with a missing dtd error using both
> Medline.parse and Entrez.parse. (traceback for medline below below)
>
>  nlmmedlinecitationset_100301.
> dtd and pubmed_100301.dtd seem to be missing from my biopython
> installation, and unavailable from the following NCBI sites:
>
> http://www.ncbi.nlm.nih.gov/dtd/ or
> http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/
>
> My apologies if this is user error; i do not see reference to this DTD issue
> in the archives so am posting the incident. Is this just bad luck during my
> learning curve, or am I missing something conceptual here?

The problem is with the NCBI "hiding" the file by not showing the raw
contents of that folder, but just an HTML page with a partial list. You
need this file:

http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_100301.dtd

I've added this to our repository so the next version of Biopython will
include it. Please let us know if anything else is missing - what was
the Entrez request you used to get the XML using this DTD file?

Regards,

Peter




More information about the Biopython mailing list