<div dir="ltr">For your last issue, if you don&#39;t mind needing to disentangle the data after you&#39;ve pulled it from the XML document, you can use this pattern to convert the document exactly into an identical nested collection of dictionaries:<br><br><div><div><font face="monospace, monospace">def recursive_dict(element):</font></div><div><font face="monospace, monospace">    data_dict = dict(element.attrib)</font></div><div><font face="monospace, monospace">    children = map(recursive_dict, element)</font></div><div><font face="monospace, monospace">    children_nodes = defaultdict(list)</font></div><div><font face="monospace, monospace">    clean_nodes = {}</font></div><div><font face="monospace, monospace">    for node, data in children:</font></div><div><font face="monospace, monospace">        children_nodes[node].append(data)</font></div><div><font face="monospace, monospace">    for node, data_list in children_nodes.items():</font></div><div><font face="monospace, monospace">        clean_nodes[node] = data_list[0] if len(data_list) == 1 else data_list</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    if clean_nodes:</font></div><div><font face="monospace, monospace">        data_dict.update(clean_nodes)</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">    if element.text is not None and not element.text.isspace():</font></div><div><font face="monospace, monospace">        data_dict[&#39;text&#39;] = element.text</font></div><div><font face="monospace, monospace">    if len(data_dict) == 1 and &#39;text&#39; in data_dict:</font></div><div><font face="monospace, monospace">        data_dict = data_dict[&#39;text&#39;]</font></div><div><font face="monospace, monospace">    tag = element.tag</font></div><div><font face="monospace, monospace">    return tag, data_dict</font></div></div><div><font face="monospace, monospace"><br></font></div><div><font face="arial, helvetica, sans-serif">Feed it the root of the ElementTree you want to parse, and it will return the complete tree in dictionary form. </font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">From that dictionary you can infer an ad-hoc schema, which will most likely be dependent on the class of organism you&#39;re looking at.</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, May 17, 2015 at 4:24 PM, Anna Simpson <span dir="ltr">&lt;<a href="mailto:acsimpson@gmail.com" target="_blank">acsimpson@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi all,<br></div>I&#39;ve been trying to parse xml files from an efetch query to the bioproject database, and kept getting an error message about no dtd (and validation=False gets me no data at all) when using Entrez.read or Entrez.parse. I found a post on this mailing list from 2013, where a gentleman had the same problem - he emailed NCBI and was told the following: <br><br>


&quot;Yes this is the &quot;normal&quot; but it is an oversight as a dtd was never created

for this database. I will have to open a ticket to the developers to create

this and have it included in the XML and on the DTD web page.&quot;<br><br>I&#39;ve emailed NCBI about this again but I&#39;m guessing there still isn&#39;t one (and I can&#39;t find it in the DTD index page). But my various googlings have led me to find that there is a schema for bioproject, and that perhaps, somehow, it could be used to parse these xml files. How  might I go about doing that?<br><br>I&#39;ve been trying to use xml parsers like element tree and Beautiful Soup but keep running into walls (how to stick an entrez handle into a parser, how to get it to give me deeply nested information when the nesting is different for each xml document I get and I&#39;m running this through a loop) so it would be great if I could ...stop doing that.<br><br></div><div>Thanks,<br></div><div>Anna<br></div><div>University of Washington, Seattle<br></div></div>

<br>_______________________________________________<br>

Biopython mailing list  -  <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br></blockquote></div><br></div>