[Biopython-dev] [Bug 2454] Iterators can't use file-like objects

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Jun 18 15:36:48 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2454





------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp  2008-06-18 11:36 EST -------
(In reply to comment #15)
> I've removed the strict file-like test in:
> 
> Bio/Sequencing/Ace.py revision: 1.12
> Bio/Sequencing/Phd.py revision: 1.6
> 
> In these cases, the handle is immediately turned into an UndoHandle which will
> be able to check for a sufficiently file like object.
> 
> Hopefully that's what you meant Michiel

Actually, I think we should avoid using an UndoHandle altogether, now that
Python has generator functions.

> - we could go further and introduce a
> parse() function and deprecate the Iterator objects in these modules.
> 
That would make things a lot easier. An Iterator class was useful in older
versions of Python, but generator functions provide a cleaner alternative.

In Ace.py, we'd need three functions:

1) read(handle), which returns one record (Contig) read from the handle, and
None otherwise;

2) parse(handle), a generator function returning an iterator over the records;

3) a local function _process_line(line, record)

These functions then look like this:

def read(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            break
    else:
        return None
    record = Contig()
    for line in handle:
        if line[:2]=='CO':
            return record
        else:
            _process_line(line, record)

def parse(handle):
    record = None
    for line in handle:
        if line[:2]=='CO':
            if record:
                yield record
            record = Contig()
        _process_line(line, record)
    if record:
        return record

The actual work is done in _process_line.

So we don't need to store the read lines explicitly; this is now taken care of
by the generator function. Hence, we don't need to convert the handle to an
UndoHandle. In addition, handle can now also be a list of lines instead of a
file handle. In this respect, I think Zachary was right in comment #11:

> Maybe it's a good idea for any parsers/iterators to just
> use the iterator-like ability of file handles?

In other words, as long as we can pull lines from the handle, we can parse it.

In Phd.py, it's even simpler. Here, we only need the read() and parse()
function:

def read(handle):
    for line in handle:
        if line.startswith("BEGIN_SEQUENCE"):
            record = Record()
        elif line.startswith("END_SEQUENCE"):
            return record
        else:
            # do the actual processing of the other lines here

def parse(handle):
    while True:
        record = read(handle)
        if not record:
            return
        yield record

Again, we can process each line just as they come along. No UndoHandle, no
Parser, no Consumer, no Scanner needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list