[Bioperl-l] 1.01

Jason Stajich jason@cgt.mc.duke.edu
Mon, 13 May 2002 16:53:38 -0400 (EDT)


On Mon, 13 May 2002, Catherine Letondal wrote:

>
> Jason Stajich wrote:
> >Some projects on the table that one might hope would be part of 1.2:
> >[...]
> > * Design the interface based on the Bioperl/PISE to describe
> >   remote analysis queues and add those classes to the main trunk.  Use
> >   this interface for local execution as well as remote.
>
> Hi,
>
> Is it time to start discussions? I don't know exactly what questions are to be discussed
> yet? Anyway, these are my questions and suggestions ...
>

Yes, it is time if people are willing to code it - I have to admit - I'd
like to push back from the table wrt to designing this and let someone
else take the lead.

I think you've already done a great deal of the work or attatching to the
PISE system so I'd like to adopt as much it as possible. I think all the
great stuff you outlined below is what we need.  Can we define some
interfaces that meet these specs.  I also hope that Martin would be
interested in making sure this will sync up with the openBSA/Novella
interface.

Shawn and Jerm have been working on interfaces to Phylip apps so would be
a cool test to build the interface to these as local and remote and see if
we can just toggle a remote/local flag and get the script to work in both
cases.

> 1) creating the factory and running:
>
>   # a) analysis queue (returns a Bio::Factory::Pise)
>   $factory = new Bio::Factory::EMBOSS;
>   # or:
>   $factory = new Bio::Factory::Pise;
>
>   # b) analysis application object (returns a Bio::Tools::Run::PiseApplication or
>   # Bio::Tools::Run::EMBOSSApplication)
>   $mfold = $factory->program('mfold');
>
>   # c) analysis results
>   $result = $mfold->run();
>
>
>   ... is that OK for EMBOSS and openBSA?
>
Sure - openBSA is probably richer about what it allows you to do in terms
of checking to see what the status of the job is.  Not important here
other than the ability to either send a job and block till a result is
returned or queue up a set of results and then polling the server at later
point for the job id status.  The RemoteBlast object could be assimilated
into this behavior as well.

>
> 2) general execution parameters:
>
>  a) local or remote execution
> 	- default could be local for EMBOSS and remote for Pise?
> 	- in Pise, the default remote server could be different for different programs (I
> 	mean, not only at Pasteur...:-) )
>
>     - so one should be able to choose between local/remote execution and, if remote, to
>    choose a non-default server location; this choices could happen either at
>    factory creation, or at application creation, or at run step:
> 	# a) at factory creation
> 	$factory = new Bio::Factory::Pise(-remote => 'http://somewhere/cgi-bin/Pise');
>
> 	# b) at application creation - take the default remote server
> 	$needle = $factory->program('needle', -remote => 1);
>
> 	# c) at run time
> 	$result = $mfold->run(-remote => 'http://bioweb.pasteur.fr/cgi-bin/seqanal/mfold.pl');
>
Exactly!
We may want to fold this into some of the new Bio::Root::HTTPget so one
can use proxies for  those behind firewalls.

>
>  b) email could be specified once at factory creation (for Pise)
>
sure.
>
> 3) parameters specification
>
>    a) when?
> 	# at factory creation?
> 	$water = $factory->program('water', sequencea => $seqa,  seqall => $seqb);
> 	$result1 = $water->run();
>
> 	# before running?
> 	$water->sequencea($seqc);
> 	$result2 = $water->run();
>
> 	# when running?
> 	$result3 = $water->run(sequencea => $seqd);
>
as part of running I guess - but one might want to be able to set some
parameters in the factory objects like
$factory->db('est');
foreach my $seq ( @seqs ) {
	$result2->run(-sequencea => $seq);
}
So setting it in the factory would account for a default behavior in
calls? Or is this making it too complicated?

>    b) how?  -name or name

I prefer -name and this is how emboss cmdline opts look so that's my vote,
but happy to be swayed with a good argument against.

>
>
> 4) analysis results: what is it, a string, an object, ...?
>
>   $result = $fasta->run();
>
> 	- in Pise/bioperl $result is an instance of PiseJob, i.e a kind of "handle" from
>         which you can fetch results (image files, treefile, ...)
> 	print $result->content("treefile");
> 	print $result->stdout;
> 	$result->save("blast2.txt");
> 	etc...
>
> 	- in Bio::Tools::Run::EMBOSSApplication, it's a string (the actual result): don't
> 	you think it's more general to have an object?
>

I guess a string is best - initially, we can always try and define an
appropriate object later?  For EMBOSS apps there are supposed to be a
finite set of report formats so we could probably code up an
EMBOSSReportReader, but not sure how useful that will be to people.

> 5) use of analysis result:
>
>    - it's convenient to be able to build a handle from a result, in order
>    to feed it to bioperl parsers or to other programs
>
>     $aln = Bio::AlignIO->newFh (-fh => $needle_result->fh("outfile.align"),
>                                 -format => "fasta");
>     $neighbor = $factory->program('neighbor', infile => $protdist_job->fh('outfile'));
>
>    - construct an analysis result from an ID:
>     $neighbor = $factory->result('http://bioweb.pasteur.fr/seqanal/tmp/blast2/A12465102130064/')
>
Yes definitely - If we can wrap HTTPget as a filehandle then it can be
passed directly to the parser.  Otherwise using LWP you have to download
the data and either write to a tempfile or wrap the data string with
IO::String to make it behave like a fh (see DB::WebDBSeqI).

> 6) misc:
>
>  - It should be possible to issue an asynchronous run request (to enable parallel
>    execution for long jobs)
>
>
> How is all that compatible with OpenBSA?
>

I think all of this is exactly what I was thinking, we just need to try
and code up the protype/port from your PISE code into bioperl (assume it
is okay if we slurp this in?).  Some generalization may be needed for the
server communication as the Novella connection will be a CORBA not HTTP?
Or is this just handled in the client code and based on the URL that is
passed with the -remote flag.

> --
> Catherine Letondal -- Pasteur Institute Computing Center
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu