[Biopython-dev] NCBI DTD File

Wed Sep 9 15:36:06 UTC 2015

Thanks - if it happens again and you can record the error to a file,
that would be great - something like this if it starts breaking again:

...
handle = Entrez.efetch(db='protein', id=gi, retmode='xml')
with open("test_case.xml", "w") as out_handle:
    out_handle.write(handle.read())
handle.close()

See also discussion here:
https://github.com/biopython/biopython/issues/515

Peter

On Wed, Sep 9, 2015 at 4:28 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
> Hi Peter,
>
> It seems that it was indeed a temporary error. Thanks for your help!
>
> Best,
> Lev
>
> On Wed, Sep 9, 2015 at 4:50 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Hi Lev,
>>
>> Which version of Biopython do you have, and which GI number(s) fail?
>>
>> The very fact the problem tag was "Error" suggests it was actually
>> an error message, not a sequence record - perhaps a temporary error?
>>
>> This worked for me:
>>
>> from Bio import Entrez
>> Entrez.email = "..."
>> handle = Entrez.efetch(db="protein", id="12345678", retmode="xml")
>> record = Entrez.read(handle, validate=True)
>> handle.close()
>> print(record)
>>
>> Using some id values like "1" could give an "empty" XML record,
>> which to me looks like an NCBI bug:
>>
>> <?xml version="1.0"?>
>>  <!DOCTYPE GBSet PUBLIC "-//NCBI//NCBI GBSeq/EN"
>> "http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.dtd">
>>  <GBSet>
>>
>> </GBSet>
>>
>> This is parsed as [] which is reasonable (empty list).
>>
>> Other values like "0" and "-1" give an HTTP Error 400: Bad Request
>> (which is good - a nice clear and explicit error).
>>
>> See also:
>>
>> Peter
>>
>>
>> On Fri, Sep 4, 2015 at 8:16 PM, Lev Tsypin <ltsypin at uchicago.edu> wrote:
>> > Hi Peter,
>> >
>> > This is me trying to get protein sequences from the protein database. I
>> > have
>> > a gi code in the variable 'gi' that I pass into the Entrez.efetch
>> > function.
>> > Specifically, I use:
>> >
>> >         handle = Entrez.efetch(db='protein', id=gi, retmode='xml')
>> >         record = Entrez.read(handle)
>> >
>> > Best,
>> > Lev
>> >
>> > On Fri, Sep 4, 2015 at 11:12 AM, Peter Cock <p.j.a.cock at googlemail.com>
>> > wrote:
>> >>
>> >> Hi Lev,
>> >>
>> >> Which database was this with? Each has somewhat different XML
>> >> behaviour./
>> >>
>> >> The NCBI have been quite good about versioning the DTD files -
>> >> normally they add new files rather than edit an existing DTD file. So
>> >> unless you've had a warning from Biopython there should be no reason
>> >> to download a new DTD file.
>> >>
>> >> Peter
>> >>
>> >> On Fri, Sep 4, 2015 at 3:44 PM, Lev Tsypin <ltsypin at uchicago.edu>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > I am encountering this error when using Bio.Entrez:
>> >> >
>> >> > Bio.Entrez.Parser.ValidationError: Failed to find tag 'Error' in the
>> >> > DTD. To
>> >> > skip all tags that are not represented in the DTD, please call
>> >> > Bio.Entrez.read or Bio.Entrez.parse with validate=False.
>> >> >
>> >> > I've found a discussion of the same issue from about a year ago, so I
>> >> > figure
>> >> > the the NCBI updated their DTD file in a strange way. I found several
>> >> > solutions: would you recommend that I download the new DTD file into
>> >> > my
>> >> > local copy of Biopython or run Entrez.read with validate=False?
>> >> >
>> >> > Best regards,
>> >> > Lev Tsypin
>> >> >
>> >> > _______________________________________________
>> >> > Biopython-dev mailing list
>> >> > Biopython-dev at mailman.open-bio.org
>> >> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>> >
>> >
>
>