[Biopython-dev] [Biopython - Bug #3430] Error parsing PubMedCentral XML files

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Mon Apr 29 09:54:24 UTC 2013


Issue #3430 has been updated by Michiel de Hoon.


> NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

The error message is correct: The XML file does not start with the XML declaration
<pre><?xml version="1.0"?></pre>

Either the XML file returned by Entrez is broken, or something went wrong when saving the file.
----------------------------------------
Bug #3430: Error parsing PubMedCentral XML files
https://redmine.open-bio.org/issues/3430

Author: Paulo Nuin
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


It seems that there is an error parsing locally downloaded PubMedCentral xml (extension nxml) files. Using the code 

@
from Bio import Entrez
handle = open('nihms83342.nxml')
records = Entrez.parse(handle)
for record in records:
    print record
@

the following error occurs (copied from iPython), even though the XML header contains the declaration


---------------------------------------------------------------------------
NotXMLError                               Traceback (most recent call last)
<ipython-input-5-e78d8d3c3888> in <module>()
      2 handle = open('nihms83342.nxml')
      3 records = Entrez.parse(handle)
----> 4 for record in records:
      5     print record

/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
    229                         # We did not see the initial <!xml declaration, so
    230                         # probably the input data is not in XML format.
--> 231                         raise NotXMLError("XML declaration not found")
    232                 self.parser.Parse("", True)
    233                 self.parser = None

NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

The XML file in question is attached.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list