[Biopython] PubmedCentral XML parsing
Peter Cock
p.j.a.cock at googlemail.com
Thu Apr 25 19:05:32 UTC 2013
On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi
>
> What would be the most direct way of parsing XML files downloaded from
> PubmedCentral ftp using BioPython? These are files that use the
> archivearticle.dtd and when parsed using non-DTD based code generate broken
> paragraphs on the body of the document due to < > between <p> items of the
> body.
>
> Thanks in advance
>
> Paulo
The Bio.Entrez parser is DTD based, and might suit your needs.
Peter
More information about the Biopython
mailing list