[Biopython-dev] Tracking DTD files in Bio.Entrez

Michiel de Hoon mjldehoon at yahoo.com
Sat Oct 23 09:19:26 UTC 2010


Hi everybody,

As you may know, the parser for XML data generated by NCBI in Bio.Entrez makes use of DTD files (from NCBI) to correctly interpret the XML data. Most (if not all) DTD files are included in the Biopython distribution in Bio/Entrez/DTDs, but particularly when NCBI updates their DTD files it may happen that a required DTD file is missing. I have now modified the parser so that it tracks the URL of DTD files, so that it can access DTDs over the internet if they are not available locally.

Still, parsing local DTD files is much faster than retrieving a remote DTD file, so when a DTD file is missing the parser will show a warning with the missing DTD, the URL where it can be found, and which directory it should be saved in (which typically is something like /usr/local/lib/python2.7/site-packages/Bio/Entrez/DTDs).

For users who do not have write permission to this directory, it may be good to also allow storing these files in the users home directory, for example in ~/.biopython/Bio/Entrez/DTDs. If we start using such a directory, we could also consider to automatically retrieve DTD files and save them in that directory without asking the user to do that manually.

I guess it's a trade-off between convenience for the user (if we download and save DTDs automatically), and transparency (we would be saving files in the user's home directory without him/her being aware of it).
Any opinions? Is this a good idea?

-Michiel.


      



More information about the Biopython-dev mailing list