[Biopython-dev] Performance of Bio.File.UndoHandle

Michael Hoffman hoffman at ebi.ac.uk
Wed Oct 15 10:51:21 EDT 2003


On Mon, 13 Oct 2003, Jeffrey Chang wrote:

> The UndoHandle creates overhead on readline due to its extra if checks 
> and function calls.
>
> [...] 
>
> The best way to speed this up might be to recode the class in C as a 
> type.  This would help because the if statement would be evaluated in 
> C, and also you can cache the self._handle.readline for a faster 
> function lookup.

Actually, I was thinking along the lines of recoding the class that
calls UndoHandle instead (see below). This new implementation does not
seem to be significantly faster than Bio.Fasta.Iterator when the
latter is used without a parser. However you get the parsing done for
free with this implementation! It seems to be about twice as fast as
using Bio.Fasta.Iterator with Bio.Fasta.RecordParser, and provides the
same functionality in a more lightweight package--a tuple of
(defline, data) instead of a Bio.Record object. What do you think?

class LightIterator(object):
    def __init__(self, handle):
        self._handle = handle
        self._defline = None

    def __iter__(self):
        return self

    def next(self):
        lines = []
        defline_old = self._defline

        while 1:
            line = self._handle.readline()
            if not line:
                if not defline_old and not lines:
                    raise StopIteration
                if defline_old:
                    self._defline = None
                    break
            elif line[0] == '>':
                self._defline = line[1:].rstrip()
                if defline_old or lines:
                    break
                else:
                    defline_old = self._defline
            else:
                lines.append(line.rstrip())
            
        return defline_old, ''.join(lines)
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute




More information about the Biopython-dev mailing list