[Biopython-dev] New Bio.SeqIO code

Peter biopython-dev at maubp.freeserve.co.uk
Wed Nov 1 10:09:59 UTC 2006


> The point I was trying to make is that for a File2SequenceDict
> function to be useful, it would end up being too complex.

Of course I'm going to be biased here, but I do find the simple current
dictionary construction useful as it is.  Clearly we have slightly
different uses in mind (which is good - the design should try and cater
to most people).

> In the answer above, a user could also do answer[key].seq to get the
> part she wants, so maybe a record2value argument is not essential in
> practice.
> 
> Part of my opposition against the File2SequenceDict function is that
> it requires the parser to be called File2SequenceIterator (which I
> don't like as a name, but more about that some other time), which
> then leads to a File2SequenceList function, which is software bloat.
> 
> So, how about making the functionality of File2SequenceDict available
> as a todict() method to the iterator object returned by 
> File2SequenceIterator, or, as a iterator2dict function?

I do like your first suggestion - the idea of adding a todict() method 
to the iterator objects.  However, that would require that all the 
parsers be written as (sub)classes, and right now several of them are 
written as generator functions.

I've found using generator functions to be very simple, and easy to
understand.  They seem like a good choice for simple file formats.  But
with a good reason enough reason, I could turn them into classes.

                         ----

Right now I am making both "file to dict" and "iterator to dict"
functions available:

File2SequenceDict(..., record2key) is implemented as
SequenceIter2Dict(File2SequenceIterator(...), record2key)

Also:
File2Alignment(...) is implemented as
Iter2Alignment(File2SequenceIterator(...))

And:
File2SequenceList(...) is implemented as list(File2SequenceIterator(...))

Leaving aside the names (which I notice are not currently consistent) I
would be fine with removing File2SequenceList, File2SequenceDict, and
File2Alignment but retaining the two functions which convert from a
SeqRecord returning iterator into dict or an alignment.

How does that sound Michiel (subject to agreeing on names)?

Peter




More information about the Biopython-dev mailing list