<div dir="ltr"><div><div>Hi all,<br></div>I've been trying to parse xml files from an efetch query to the bioproject database, and kept getting an error message about no dtd (and validation=False gets me no data at all) when using Entrez.read or Entrez.parse. I found a post on this mailing list from 2013, where a gentleman had the same problem - he emailed NCBI and was told the following: <br><br>
"Yes this is the "normal" but it is an oversight as a dtd was never created
for this database. I will have to open a ticket to the developers to create
this and have it included in the XML and on the DTD web page."<br><br>I've emailed NCBI about this again but I'm guessing there still isn't one (and I can't find it in the DTD index page). But my various googlings have led me to find that there is a schema for bioproject, and that perhaps, somehow, it could be used to parse these xml files. How might I go about doing that?<br><br>I've been trying to use xml parsers like element tree and Beautiful Soup but keep running into walls (how to stick an entrez handle into a parser, how to get it to give me deeply nested information when the nesting is different for each xml document I get and I'm running this through a loop) so it would be great if I could ...stop doing that.<br><br></div><div>Thanks,<br></div><div>Anna<br></div><div>University of Washington, Seattle<br></div></div>