[Biopython] parsing Entrez SNP XML files
Gerard Schaafsma
Gerard.Schaafsma at med.lu.se
Fri Sep 6 07:38:33 UTC 2013
Hi,
I am trying to parse XML files which I downloaded from the NCBI site
(ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing
records from the SNP (dbSNP) database.
When I do:
import sys
from Bio import Entrez
handle = open(xmlFile)
records = Entrez.parse(handle)
for record in records:
for k, v in record.items():
print k, v
I get the following error message:
NotImplementedError: The Bio.Entrez parser cannot handle XML data that
make use of XML namespaces
I am using Biopython 1.62 on a PC with Linux 3.2.0-52-generic x86_64
GNU/Linux
Looking for this error message showed that it might have something to do
with the DTD files from NCBI, but since I am using the newest Biopython
version, I would expect these to be OK.
Moreover, in the first 2 lines of the XML file there is no mention of
any DTD file, just:
<?xml version="1.0" encoding="UTF-8"?>
<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"
xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum
ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd" specVersion="3.4"
dbSnpBuild="138" generated="2013-08-01 17:06">
Anyone with the same problem, and a solution?
Best regards,
Gerard
--
Gerard Schaafsma
Lund University
Department of Experimental Medical Science
Protein Structure and Bioinformatics Group
Hs 66, BMC D10
Box 117
22100 Lund
Sweden
More information about the Biopython
mailing list