[Biopython-dev] Fwd: [Fwd: missing NCBI DTDs]

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 26 15:04:28 UTC 2014


On Wed, Mar 26, 2014 at 2:55 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> On Wed, 3/26/14, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Long term not bundling the DTD files seems a good idea.
>> Being cautious we could bundle them for the next release,
>> see how the download mechanism works in the wild, and
>> drop the DTD files for the release after that?
>
> I don't think we need to be so cautious.

OK.

We could then get rid of the DTDs folder under Bio/Entrez
and tweak the Entrez XML parsing tests to ensure they
are only run if the internet is available.

>> This would mean all the Entrez parser tests would require
>> internet access (even if using an old XML file on disk),
>
> But only the first time. After a DTD is downloaded, it is stored
> locally, and internet access won't be needed the next time the XML
> (or other XML files relying on the same DTD) is parsed.

Yes, but for many test environments, it is always the first time ;)
e.g. TravisCI uses a clean VM for each test run.

> In my experience, using local DTDs is much much faster than
> accessing them through the internet for each XML file, so I
> would not advocate an internet-only solution.

Yes (I didn't mean to imply that - sorry for any confusion).

> As an alternative to local storage, we could consider downloading
> all DTDs for each Biopython session, but keeping the results of
> parsing the DTD in memory (so we won't have to download each
> DTD over and over again if we're parsing many XML files).
> This can be almost as fast as using local storage, but will require
> internet access, and also Bio.Entrez would have to be changed.

A local cache (as implemented) seems fine to me.

Peter



More information about the Biopython-dev mailing list