[Biopython-dev] Handles and/or filenames in Bio.SeqIO etc?

Peter biopython at maubp.freeserve.co.uk
Tue Jul 28 16:34:48 UTC 2009

Hi all,

Eric just reopened an old debate - should Bio.SeqIO (and similar)
support filenames as well has handles?

In fact, this something we originally discussed way back when planning
SeqIO way back in Nov 2006. Michiel and I were at the time generally
in favour of allowing filename/handles, but Iddo Friedberg (who at that
time was basically in charge) and Chris Lasher didn't like this. It would
have broken with the existing Biopython parsers which were all handle
only. After a little debate, we opted to support just handles, knowing we
could if need be later allow filenames instead.

[Other things which with hindsight I am very glad Michiel, Iddo, Chris
etc talked me out of where "guessing" the file format based on the
filename or its contents.]

I had written up a draft email on this topic a couple of months ago, to
raise this issue (which I can't find right now) which went over some of
the downsides - other than complicating what is currently a nice clean
API. I never sent it because after thinking about it, I was happy with
handles only. I guess I'll have to retype my objections as they come
back to me.

On the thread about a possible Bio.SeqIO.convert function, Eric wrote:

> But the main reason I piped up was that some time ago, we observed that
> some popular Python libraries have functions that can accept either an
> open file handle or a file name, and do the right thing. The xml.etree
> module in the standard lib does this by checking if the 'file' argument
> has a 'read' method, and if not, trying to open it. I didn't see any reason
> for Bio.TreeIO to be any fussier than the standard library, so...
> http://github.com/etal/biopython/blob/phyloxml/Bio/TreeIO/NexusIO.py

First of all, I would argue Bio.TreeIO should be consistent with Bio.SeqIO
and Bio.AlignIO with respect to handles vs filenames.

If we do agree to support filenames or handles, then I would keep all the
Bio.ModuleIO.SubModule code using handles only, and put the boiler
plate (repeated) handle/filename code in the Bio.ModuleIO functions only.
This is (a) less work, and (b) less code duplication. After all, the code in
the modules under Bio.SeqIO (and similar) is rarely used directly.

Other top level parsers, like Bio.Entrez.read() might then also deserve
the filename/handle treatment. As a bonus, Bio.Nexus would cease to
be an oddity as it does this already.

> Implementing this for SeqIO.convert() (or ideally, read/parse/write on all
> the *IO modules) would make it very nice for files other than stdin and
> stdout -- otherwise, the user needs to open and maybe close two file handles
> before calling convert().
> What do you think?

>From an end user point of view, especially when working directly at the
python prompt interactively, being able to give filenames would be nicer.

This will also make lots of the examples in the tutorial shorter and simpler,
because we don't have to do things like closing output handles (because
the SeqIO.write() function would do it for us). There is a minor downside
that Python beginners won't necessarily get to gripes with handles so quickly.

There is a cost, in that lots of parser code will need to check if it has a
filename and if so open it. For output code this is a little more complex,
as the writer function must also close the file afterwards.


More information about the Biopython-dev mailing list