[Biopython-dev] Performance of Bio.File.UndoHandle
    Michael Hoffman 
    hoffman at ebi.ac.uk
       
    Wed Oct 15 10:51:21 EDT 2003
    
    
  
On Mon, 13 Oct 2003, Jeffrey Chang wrote:
> The UndoHandle creates overhead on readline due to its extra if checks 
> and function calls.
>
> [...] 
>
> The best way to speed this up might be to recode the class in C as a 
> type.  This would help because the if statement would be evaluated in 
> C, and also you can cache the self._handle.readline for a faster 
> function lookup.
Actually, I was thinking along the lines of recoding the class that
calls UndoHandle instead (see below). This new implementation does not
seem to be significantly faster than Bio.Fasta.Iterator when the
latter is used without a parser. However you get the parsing done for
free with this implementation! It seems to be about twice as fast as
using Bio.Fasta.Iterator with Bio.Fasta.RecordParser, and provides the
same functionality in a more lightweight package--a tuple of
(defline, data) instead of a Bio.Record object. What do you think?
class LightIterator(object):
    def __init__(self, handle):
        self._handle = handle
        self._defline = None
    def __iter__(self):
        return self
    def next(self):
        lines = []
        defline_old = self._defline
        while 1:
            line = self._handle.readline()
            if not line:
                if not defline_old and not lines:
                    raise StopIteration
                if defline_old:
                    self._defline = None
                    break
            elif line[0] == '>':
                self._defline = line[1:].rstrip()
                if defline_old or lines:
                    break
                else:
                    defline_old = self._defline
            else:
                lines.append(line.rstrip())
            
        return defline_old, ''.join(lines)
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute
    
    
More information about the Biopython-dev
mailing list