[Bioperl-l] Re: standard IO & URL handling

Ewan Birney birney@ebi.ac.uk
Tue, 26 Sep 2000 11:42:20 +0100 (GMT)

On Tue, 26 Sep 2000, Hilmar Lapp wrote:

> Ewan Birney wrote:
> > 
> > I am not sure how IOManager is set out, but I would like to see in this
> > case a true base class (? perhaps IOManager) which IO orientated modules
> > in bioperl would inheriet from. This would
> > 
> >         (a) save people from having to type get/sets for fh/filename all
> > the time and
> > 
> >         (b) make bioperl more consistent.
> > 
> > For Network orientated modules, something similar might occur or the IO
> > system might be good enough.
> > 
> > Does this sound sane Hilmar? Would you like to propose the system we
> > should try to stick to?
> > 
> I don't feel that I'm the one having enough overview of the code and
> 'standard' modules available, but I can put a starter, hoping that people
> out there will erase the flaws.
> The wish-list/requirements from my point of view are the following:
> 1) Basic FileIO functionality supplied ready-to-use by a core module,
> implemented through inheritance or delegation. This functionality should
> at least comprise of
> 	o -file and -fh parameters in new()/_initialize() being dealt with
> 	o method fh() (or _filehandle(), whatever you prefer)
> 	o support for keeping track of the filename if one was supplied
> 	o method _pushback()
> 	o method _readline()
> 	o method close()
> 	o support for the capability of tying a filehandle to the object
> 	o ability to deal with any sort of IO::* handles

I agree with most of these except _pushback and _readline which seems too
implementation specific to me. But --- perhaps these methods happen on
enough parsers to just stick them in here.

The method should implement a DESTROY to close the filehandle. 

I prefer fh() method. 

> The two latter obviously refer to some comments by others. I have to
> admit that so far I have never been at a point where I found the tying
> possibility to save me lots of hassle, but since it is there in Perl it
> is certainly something very useful in certain places.
> While I think Matthew's point is basically right, first most BioPerl
> modules using file IO use only one file at a time, and second, probably
> at least half of BioPerl does use file IO. So, utilizing a central file
> IO facility should be as easy and straight-forward as possible.
> A proposal for implementation is then:
> 	o a base class implementing the requirements, like Bio::Root::StreamIO
> 	o a module in the need of stream IO inherits from this base class
>           -- or see below
> 	o a module that needs multiple streams creates multiple (or, in the 
>           case of inheritance, additional) instances of this class
> The downside of simple inheritance is that the implementing class is
> hard-coded
> and also cannot be changed (set) at run-time. An alternative
> circumventing this
> could be to have something like the following in Bio::Root::Object (or a
> descendant):
> 	o method stream() which gets/sets the StreamIO implementing object
>           (note that this can be smart in not creating anything until it
> is
>            accessed, and in accepting named parameters, too)

I like the idea of inherieting off Bio::Root::StreamIO, which in my view
should inheriet off Bio::Root::RootI to slimline the code.

> This may sound like complicated or a lot of code, but if Perl is indeed
> so rich in modules supporting all of this, it should in fact be very
> straight-forward to implement it. And a programmer implementing a new
> BioPerl parser does not have to worry about how to code portable and
> consistent IO if he/she just sticks to what the core supplies.

Indeed. Consistency is good (assumming we all agree to it). I am happy
with this framework.

> Concerning URL/HTTP stuff, what I'd like to have is what I described with
> 'delegate the guts'. So, usually you have a URL you want to GET from or
> POST to, and you have a table of key/value pairs (yes, of course, keys
> may have multiple values), and you don't want to bother about how HTTP
> works, and how to get through a firewall. You even may not want to bother
> which particular protocol your URL refers to (ftp, file, etc). So, a core
> module for supporting consistent net IO should from my point of view
> enable something like
> 	o $stream = $netio->openURL($url, 'GET', \%query);

Hmmm. I wonder if LWP does supply this. Someone needs to investigate...

> If this is already out there (probably it is), that's fine. It should be
> very straight-forward then to implement a core module for this, and
> complaints about inconsistent behaviour across BioPerl modules (one is
> firewall-safe, another one is not) should become history, and a BioPerl
> programmer in the need of URL queries very quickly finds his way.
> One remark concerning LWP: it is indeed already on our list of
> dependencies, but only very optional. E.g., I haven't installed it, and
> presently the remote BLAST running module is not functional anyway
> (because of the NCBI BLAST server changes). The long list of dependencies
> on non-core packages LWP has makes it not really attractive from an
> industrial environment point of view, I have to admit. Anyway, as I have
> no overview on what packages are available for this purpose and how
> likely they are to become a Perl standard, the vote should be cast by
> those who know better.
> Does any of the things proposed make sense to people out there?

Sounds very sensible to me ;)

> 	Hilmar
> -- 
> -----------------------------------------------------------------
> Hilmar Lapp                                email: hlapp@gmx.net
> NFI Vienna, IFD/Bioinformatics             phone: +43 1 86634 631
> A-1235 Vienna                                fax: +43 1 86634 727
> -----------------------------------------------------------------

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420