Michiel de Hoon
mdehoon at c2b2.columbia.edu
Mon Feb 26 01:28:06 UTC 2007
> SequenceIterator(handle, format)
> SequencesToDict(sequences, key_function=None)
> SequencesToAlignment(sequences, ...)
> WriteSequences(sequences, handle, format)
> Does anyone want to suggest different names for these functions?
>>> from Bio.SeqIO import SequenceIterator, WriteSequences
>>> SequenceIterator(handle, format)
>>> WriteSequences(sequences, handle, format)
I would prefer
>>> from Bio import SeqIO
>>> SeqIO.read(handle, format)
>>> SeqIO.write(sequences, handle, format)
for the following reasons:
1) Similar functions in the Python standard library use a short verb
that describes what the function does, not what the function returns.
>>> myfile = open("myfile.txt") # Note: this returns an iterator
>>> pickle.dump(object, handle)
>>> xml.sax.parse(source, handler)
2) The lack of symmetry between SequenceIterator and WriteSequences
makes them harder to remember. Each time I use Bio.SeqIO, I wonder
whether it is SequenceIterator or ReadSequences.
3) SequenceIterator is not factually correct; it would be a
SeqRecordIterator. But that is even harder to remember, and involves
even more typing.
4) The "Sequence" in SequenceIterator and WriteSequences is redundant.
As these functions are in the SeqIO module, we already know they handle
sequences. In addition, new users will probably not know what an
5) Bio.SeqIO being a new module allows us to correct some design errors
from the past. One thing that always bothered me in Biopython is that it
is hard to guess its usage; I always need to look up in the manual how
to use a particular parser.
Now, "read" and "write" are generic names that can be used by similar
functions in other Biopython modules. For example, the new Blast XML
parser tentatively uses NCBIXML.parse. This function returns an
iterator, with a Blast record for each Blast query, resembling how
"read" works in Bio.SeqIO. Renaming the NCBIXML parser function to
NCBIXML.read would give us some internal consistency in Biopython and
enable us to guess the function name without having to look it up in the
manual each time.
More information about the Biopython-dev