[Biopython-dev] Bio.IntelliGenetics

Michiel de Hoon mjldehoon at yahoo.com
Wed Jul 2 13:30:06 UTC 2008


Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file.

Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record.

In that case, Bio.SeqIO looks like a more suitable place for this parser.
The user would see something like this:
>>> from Bio import SeqIO
>>> handle = open("mydatafile.txt")
>>> records = SeqIO.parse(handle, "ig")
>>> records.comment
"This is the overall comment"
>>> for record in records:
# ... record is a SeqRecord.

Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator.

Any objections, anybody?

--Michiel



      



More information about the Biopython-dev mailing list