[Bioperl-l] Bio::Root::IO reads URLs from -file

Allen Day allenday at ucla.edu
Tue Aug 10 15:57:22 EDT 2004


On Tue, 10 Aug 2004, Jason Stajich wrote:

> On Tue, 10 Aug 2004, Allen Day wrote:
> 
> > On Tue, 10 Aug 2004, Peter van Heusden wrote:
> >
> > > Hilmar Lapp wrote:
> > >
> > > > I lean with Ewan to -url as I like explicit commands better than
> > > > possibly dubious magic behind the scenes ... imagine someone stores
> > > > files by names that match their url ...
> > > >
> > > > There's one thing though that's important IMO that Jason brings up: I
> > > > don't know how you implemented this but I think Bio::Root::IO must not
> > > > be dependent on LWP or any such beast that doesn't come with perl.
> > > >
> > > I'm with the majority in that 'magic' creates possible confusion and
> > > more room for error. As to Hilmar's idea of not depending on LWP, I
> > > think this is also a good idea, and maybe the URL code can be a kind of
> > > 'mixin' - i.e. implement it in another module and then have
> > > Bio::Root::IO optionally add it as a plugin. What do you intend to do
> > > with this capability? Is there going to be another module that depends
> > > on the -url ability?
> >
> > Bio::Root::IO::_initialize_io() now accepts a '-url' argument.
> >
> > If present, and if LWP is loadable, _initialize_io() attempts to use
> > LWP::Simple::getstore() to download the url to a local tempfile, and
> > assigns that tempfile to the equivalent of _initialize_io()'s '-file'
> > argument.  This works for HTTP, HTTPS, FTP, and all other protocols
> > supported by LWP.  If a file request fails, there is a retry loop in place
> > to retry a few times to fetch the file.
> 
> The tempfile gets cleaned up by LWP?  We do this sort of thing in
> Bio::Tools::Run::RemoteBlast and within Bio::DB::NCBIHelper,et al perhaps
> we can localize some of that code to a -url param where it is a GET
> request...

no, i use Bio::Root::IO::tempfile() to generate the tempfile, and use LWP
to write into that.  LWP doesn't know how to clean up after itself, as far
as i can tell.

> 
> >
> > If LWP is not loadable, _initialze_io() uses Bio::Root::HTTPget to open a
> > socket to the file's host and sets '-fh' to read from this socket.  This
> > only works for HTTP.  There is no retry loop in place here, as
> > Bio::Root::HTTPget throws an error if it can't open the socket.  It's
> > possible to modify Bio::Root::HTTPget to do retries, but I didn't feel
> > like poking around in there.
> >
> > Still remaining to be done:
> >
> >   [1] add -url to the documentation
> >   [2] checking for existance of clashing '-file' or '-fh' arguments
> >   [3] add additional tests to t/RootIO.t for testing https and ftp
> >       retrievals.
> >
> > Regarding another module depending on this, yes, there will be one, that's
> > the only reason I added this :).  I have a new FeatureIO subsystem.  One
> > format it can parse is GFF v3.  Valid GFF v3 requires features to be typed
> > according to the Sequence Ontology or an extension thereof.  As part of
> > the parse it downloads the Sequence Ontology DAG-Edit files, parses them
> > into a Bio::Ontology, and returns Bio::SeqFeatureI objects with
> > Annotation::OntologyTerms attached.
> >
> > I will commit the FeatureIO code soon.
> >
> 
> Cool! Will it also support some sort of caching of SO too?  Maybe we can
> change Tools::GFF to delegate to FeatureIO for GFF3 files instead of
> having 2 modules doing the same thing.

well, it just caches to the tempfile right now and deletes on program
termination, but if you want to add functionality for -url to store the
file somewhere (perhaps in a filename in $TEMPDIR that is the md5 sum of
the URL?), be my guest.

my idea is to do away with Bio::Tools::GFF entirely.

> Also, have you worked on the alignment <-> GFF3 at all either?  It is an
> almost-doable thing with HSP->cigar_line but I am not sure we have a
> cigar2HSP factory yet.

nope, haven't looked at this.  i don't really do alignments so i don't
need this functionality.

-allen


> 
> 
> -jason
> 
> 
> 
> > -Allen
> >
> >
> > >
> > > Peter
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 


More information about the Bioperl-l mailing list