<div dir="ltr">Michiel, Peter,<div><br></div><div>Thanks, for the feedback. Updating <span style="font-family:arial,sans-serif;font-size:12.7272720336914px">startNamespaceDeclHandler </span>seems to be the logical way to go. I don&#39;t have much experience with XML schemas, but I will give it a try and make a pull request if I get something decent working.</div><div><br></div><div>Ivan</div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 20, 2014 at 8:20 PM, Michiel de Hoon <span dir="ltr">&lt;<a href="mailto:mjldehoon@yahoo.com" target="_blank">mjldehoon@yahoo.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ivan,<br>

<br>

I am the original author of Bio.Entrez.<br>

The parser in Bio.Entrez consists of two parts: The XML parser and the DTD parser.<br>

The DTD parser is used to determine how the elements in the XML file should be represented in Python.<br>

To allow schemas, all that is needed is to write a parser for the schema; the XML parser is unchanged.<br>

In Bio/Entrez/Parser.py, you will find the method startNamespaceDeclHandler;<br>

currently it just raises a NotImplementedError.<br>

If you try the Bio.Entrez parser on your XML file, you will see that this error gets raised.<br>

So all you would have to do is to implement startNamespaceDeclHandler;<br>

it should parallel externalEntityRefHandler, which parses DTD files, though the bulk of the work is done in elementDecl.<br>

Please let me know if you run into any problems.<br>

<br>

Best,<br>

-Michiel.<br>

<br>

<br>

<br>

<br>

--------------------------------------------<br>

On Fri, 11/21/14, Ivan Erill &lt;<a href="mailto:ivan.erill@gmail.com" target="_blank">ivan.erill@gmail.com</a>&gt; wrote:<br>

<br>

 Subject: [Biopython] NCBI e-utils parser upgrade<br>

 To: <a href="mailto:biopython@mailman.open-bio.org" target="_blank">biopython@mailman.open-bio.org</a><br>

 Date: Friday, November 21, 2014, 2:42 AM<br>

<div><div><br>

 Hi all,<br>

 As part of my<br>

 work, I need to deal with the new WP protein records at NCBI<br>

 and, specifically, with the information on their coding<br>

 sequences. This information is returned by E-utils through a<br>

 an integrated protein report type of view:<br>

 <a href="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&amp;id=231025&amp;rettype=ipg" target="_blank">http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&amp;id=231025&amp;rettype=ipg</a><br>

<br>

 which does not use<br>

 a DTD for the XML, but rather a schema. Although there has<br>

 been no formal announcement, I&#39;ve been talking to NCBI<br>

 people and they tell me that they will progressively be<br>

 moving to schemas (which provide more fine grained<br>

 validation specification). Specifically, all new XML exports<br>

 from NCBI will be using schemas. I don&#39;t believe that<br>

 existing DTDs are going to be replaced by schemas for<br>

 now.<br>

 My original<br>

 through was to branch an update for the current XML parser<br>

 in BioPython, but it looks like using schemas would be a<br>

 major overhaul of the existing code-base and it might make<br>

 more sense to develop a parallel parser, so I first wanted<br>

 to check on what approach you guys would prefer to do<br>

 code-wise.<br>

 Regards,<br>

 Ivan<br>

<br>

</div></div> -----Inline Attachment Follows-----<br>

<br>

 _______________________________________________<br>

 Biopython mailing list  -  <a href="mailto:Biopython@mailman.open-bio.org" target="_blank">Biopython@mailman.open-bio.org</a><br>

 <a href="http://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br>

</blockquote></div><br></div></div>