[BioPython] 1.00a2 bug report (+ kludge fix)
Jan T. Kim
kim@inb.mu-luebeck.de
Wed, 1 Aug 2001 17:27:09 +0200
Hi all,
earlier this week, I installed biopython-1.00a2... and as a first result,
a student reported that the Bio.GenBank.NCBIDictionary has developed
a habit of eating the first five lines off each record. This bug is
quite severe with respect to BioPython's usability / functionality,
but fortunately, it should be rather simple to fix it: Investigating
this matter revealed the following:
* Bio.GenBank.NCBIDictionary.__getitem__() uses Bio.WWW.NCBI.query()
* Bio.WWW.NCBI.query() uses Bio.WWW.NCBI._open()
* Bio.WWW.NCBI._open() obtains a handle for an HTTP response, turns
that into an UndoHandle and gets the first 5 lines of the response
to check for an error (Bio/WWW/NCBI.py, line 131). The 5 lines
read are fed back into the UndoHandle using the saveline() method.
(See bottom of this message for a remark on this.)
* The saveline() method adds those line to the private member list
_saved of the UndoHandle instance. However, the read() method
just ignores them, so they are not retrieved even though they have
been put back.
The problem is with the UndoHandle::read() method:
def read(self, size=-1):
saved = ''
while size > 0 and self._saved:
if len(self._saved[0]) <= size:
size = size - len(self._saved[0])
saved = saved + self._saved.pop(0)
else:
saved = saved + self._saved[0][:size]
self._saved[0] = self._saved[0][size:]
size = 0
return saved + self._handle.read(size)
The body of the while loop is never executed (unless the optional second
argument is specified, which is not the case in the context outlined
above). The problem appears to be that a size of -1 is supposed to mean
"infinity", but that does not actually come out in the implementation
of the read() method.
As a kludge, I modified the method as follows:
def read(self, size=-1):
saved = ''
if size == -1 :
saved_size = 2147483647
else :
saved_size = size
while saved_size > 0 and self._saved:
if len(self._saved[0]) <= saved_size:
saved_size = saved_size - len(self._saved[0])
saved = saved + self._saved.pop(0)
else:
saved = saved + self._saved[0][:saved_size]
self._saved[0] = self._saved[0][saved_size:]
saved_size = 0
# print 'saved:', saved
return saved + self._handle.read(size)
This hack repairs the script which was broken by upgrading to 1.00a2, but
I hope it's needless to say that this is not a proper fix. I just post
this as a proof of my bug analysis.
As a remark regarding distinction of error messages versus GenBank records
in the HTTP response, it might be a better method to look at the MIME type
of the response. This is text/plain in case of a successful record retrieval,
but text/html if errors are reported. This might serve as as a basis for
additional / improved methods for error detection.
Greetinx, Jan
--
+- Jan T. Kim -------------------------------------------------------+
| *NEW* --> email: kim@inb.mu-luebeck.de |
| *NEW* --> WWW: http://www.inb.mu-luebeck.de/staff/kim.html |
*-----=< hierarchical systems are for files, not for humans >=-----*