<html><head></head><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue-Light, Helvetica Neue Light, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:10px"><div dir="ltr" id="yui_3_16_0_ym19_1_1482385754706_107435">> Entrez.parse was written for a reason, to parse complex xml data so that
it easy to extract citation data from it.</div><div dir="ltr">> Entrez.read, does indeed
work, but the output contains such a complex data structure, it is a
non-trivial exercise to parse it.</div><div id="yui_3_16_0_ym19_1_1482385754706_107641" dir="ltr"><br></div><div id="yui_3_16_0_ym19_1_1482385754706_107610" dir="ltr">There is only one difference between Entrez.parse and Entrez.read: Entrez.read parses the whole data at once, while Entrez.parse iterates through the data.</div><div id="yui_3_16_0_ym19_1_1482385754706_107611" dir="ltr">There is no difference in the complexity of the data structure returned by Entrez.parse and Entrez.read: in both cases, the data structure is consistent with what NCBI specifies in the DTD referenced in the XML.</div><div id="yui_3_16_0_ym19_1_1482385754706_107747" dir="ltr">Now, Entrez.parse only makes sense if the data structure returned by NCBI corresponds to a list in Python. If it doesn't, then iterating through the XML data makes no sense, and you should use Entrez.read instead.<br></div><div id="yui_3_16_0_ym19_1_1482385754706_108072">In this particular case, NCBI has changed the data structure such that Entrez.parse is not appropriate and you should use Entrez.read instead. This does not mean that Entrez.parse is broken.</div><div id="yui_3_16_0_ym19_1_1482385754706_108045"><br></div><div id="yui_3_16_0_ym19_1_1482385754706_108073">We do need to update the Biopython documentation though.</div><div><br></div><div>Best,</div><div id="yui_3_16_0_ym19_1_1482385754706_108109">-Michiel.<br></div><div class="qtdSeparateBR"><br><br></div><div style="display: block;" class="yahoo_quoted"> <div style="font-family: HelveticaNeue-Light, Helvetica Neue Light, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 10px;"> <div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 16px;"> <div dir="ltr"><font face="Arial" size="2"> On Thursday, December 22, 2016 12:47 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:<br></font></div> <br><br> <div class="y_msg_container">On Wed, Dec 21, 2016 at 7:47 AM, Michiel de Hoon <<a shape="rect" ymailto="mailto:mjldehoon@yahoo.com" href="mailto:mjldehoon@yahoo.com">mjldehoon@yahoo.com</a>> wrote:<br clear="none">> In what sense is the current result from Entrez.read more difficult to parse<br clear="none">> than the previous result from Entrez.parse?<br clear="none">> As far as I can tell, Entrez.read and Entrez.parse are both working<br clear="none">> correctly.<br clear="none">> Best,<br clear="none">> -Michiel<br clear="none"><br clear="none">In this example we expected a list-like structure with an<br clear="none">entry for each record requested (here two), allowing<br clear="none">iteration over these records with Entrez.parse as in the<br clear="none">original example:<br clear="none"><br clear="none">from Bio import Entrez<br clear="none">Entrez.email = "<a shape="rect" ymailto="mailto:Your.Name.Here@example.org" href="mailto:Your.Name.Here@example.org">Your.Name.Here@example.org</a>"<br clear="none">handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")<br clear="none">records = Entrez.parse(handle)<br clear="none">for record in records:<br clear="none"> print(record['MedlineCitation']['Article']['ArticleTitle’])<br clear="none"><br clear="none">That no longer works - it seems the Entrez parsing code no<br clear="none">longer thinks what the NCBI returns is list-like, and so<br clear="none">Entrez.parse rejects it, saying using Entrez.read to load<br clear="none">everything at once.<br clear="none"><br clear="none">This works perfectly with our Tests/Entrez/pubmed2.xml<br clear="none">example file (also two PubMed articles), and at first glance<br clear="none">the XML structure is the same (other than the DTD update).<br clear="none"><br clear="none">The top level XML tag's DTD has changed slightly:<br clear="none"><br clear="none"><!ELEMENT PubmedArticleSet (PubmedArticle | PubmedBookArticle)+><br clear="none"><br clear="none">Now with pubmed_170101.dtd this can be a deletion:<br clear="none"><br clear="none"><!ELEMENT PubmedArticleSet ((PubmedArticle | PubmedBookArticle)+,<br clear="none">DeleteCitation?) ><br clear="none"><br clear="none">I remain puzzled about what exactly has changed here.<div class="yqt9323033972" id="yqtfd85983"><br clear="none"><br clear="none">Peter</div><br><br></div> </div> </div> </div></div></body></html>