[Biopython-dev] Bio.File

Michiel de Hoon mjldehoon at yahoo.com
Thu Sep 8 14:49:09 UTC 2011



--- On Wed, 9/7/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> UndoHandle used to be used in Bio.Entrez for spotting
> error conditions, but now we trust the NCBI to set an
> HTTP return code:
> 
> https://github.com/biopython/biopython/commit/2c4d8b99fc1b2dffa726e7d9956d766f7013164d

No we shouldn't rely an HTTP return code. The idea is that only the parser can know if the output returned by NCBI is valid, as in:

handle = Entrez.efetch(...something...)
try:
    record = Entrez.read(handle)
raise Exception:
    # NCBI returned something invalid, or at least
    # something that we don't know how to parse


> If the server could be relied on to always give an
> HTTP error code this wouldn't be needed:
> 
> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
> 

I don't like this approach much, as it depends on exactly what the error message looks like, and misses any other problems, such as incomplete output. There will be a certain false positive rate, with return values that pass the checking of the first 10 lines but are still unusable. Even worse, the false positive rate can suddenly go up if the server maintainers decide to change anything in their error messages. This kind of checking should be done by the parser, which can tell you exactly if the data are valid, or if not, what is wrong with it.

Best,
--Michiel.

[copied from Bio/TogoWS/__init__.py]:

    # Wrap the handle inside an UndoHandle.
    uhandle = File.UndoHandle(handle)

    # Check for errors in the first 10 lines.
    # This is kind of ugly.
    lines = []
    for i in range(10):
        lines.append(uhandle.readline())
    for i in range(9, -1, -1):
        uhandle.saveline(lines[i])
    data = ''.join(lines)

    if data == '':
        #ValueError? This can occur with an invalid formats or fields
        #e.g. http://togows.dbcls.jp/entry/pubmed/16381885.au
        #which is an invalid file format, I meant to try this
        #instead http://togows.dbcls.jp/entry/pubmed/16381885/au
        raise IOError("TogoWS replied with no data:\n%s % url")
    if data == ' ':
        #I've seen this on things which should work, e.g.
        #e.g. http://togows.dbcls.jp/entry/genome/X52960.fasta
        raise IOError("TogoWS replied with just a single space:\n%s" % url)
    if data.startswith("Error: "):
        #TODO - Should this be a value error (in some cases?)
        raise IOError("TogoWS replied with an error message:\n\n%s\n\n%s" \
                      % (data, url))
    if "<title>We're sorry, but something went wrong</title>" in data:
        #ValueError? This can occur with an invalid formats or fields
        raise IOError("TogoWS replied: We're sorry, but something went wrong:\n%s" \
                      % url)





More information about the Biopython-dev mailing list