[Bioperl-l] Pise/bioperl

Fri, 30 Nov 2001 15:59:53 +0100

Dear bioperlers,

I have developped bioperl modules for Pise (Pise is an interface generator,
http://www-alt.pasteur.fr/~letondal/Pise). The idea is to use the
information about ~ 300 programs (phylogeny, pattern
discovery, databases searches, alignment, protein/DNA/RNA analysis,
structure prediction, EMBOSS...) that is already available in the Pise XML
definitions, to generate bioperl classes for running jobs on a Pise
server and getting results:

	my $tacg = Pise::tacg->new ($cgi, $email,
                                    sequence => $seq);
	my $job = $tacg->run;
	print $job->stdout;

To use these modules, it's not necessary to install Pise (i.e a Pise
server), since they just provide a client.

There are 3 main classes:

       Pise              PiseJob               PiseJobParser
       ----              -------               -------------
       factory     -->   actual job           (Perl/SAX) to parse 
       to set            (methods to           XHTML server output
       parameters         actually submit  
       and launch         the http request
       jobs               via LWP::UserAgent,
                          get job state
         |                and results)
         |
       +------+
       | Pise |
       +------+---------------------------------------+
       | toppred fasta hmmbuild nnssp blast2 clustalw |
       | genscan tacg dnapars wublast2 ...            |
       |                                              |
       +----------------------------------------------+

The specific programs' classes (e.g Pise::toppred.pm, Pise::fasta.pm,
Pise::hmmbuild.pm, Pise::nnssp.pm, ...), which are automatically
generated, are subclasses of the Pise class and provide informations
about the program parameters. 

PiseJob instances are created either by the Pise subclass, or directly
by the user from the url of an already running job (a long job for
instance). When submitting a job, you can either wait for completion
($program->run) or manage the wait loop yourself ($program->submit and
$program->terminated), thus enabling parallel jobs (see
Examples/parallel.pl).

The following example shows the main features of Pise/bioperl:

	- running remote jobs and fetching results through an object

	- interface with bioperl: Bio::Seq objects as input parameter and
	job output given as filehandle to bioperl parsers (sequence
	and data input can also be provided as a file, a string, or a
	filehandle - which enables to pipe programs)

	- idea of factory (similar to Bio::Tools::Run::StandAloneBlast)

	- distributed analyses: since Pise is installed on several
	servers (and since the maintainers gave me the permission to
	give this example - thanks! :-) )

##########################################################
# Example

use Pise::genscan;
use Pise::coils2;
use Pise::saps;
use Pise::iep;
use Bio::DB::GenBank;
use Bio::Tools::Genscan;

my %job;
my %factory;

#############################
# general pise configuration

my %cgi = ('genscan' => 'http://bioweb.pasteur.fr/cgi-bin/seqanal/genscan.pl',
	   'coils2' => 'http://tofu.tamu.edu/cgi-bin/seqanal/coils2.pl',
	   'saps' => 'http://bioweb.pasteur.fr/cgi-bin/seqanal/saps.pl',
	   'iep' => 'http://ubigcg.mdh4.mdc-berlin.de:8080/cgi-bin/Pise/iep.pl'
	   );
my $email = "" ; #------- Your email --------#

#############################
# fetch sequence

my $gb = new Bio::DB::GenBank;
my $seq = $gb->get_Seq_by_acc($ARGV[0]); # try with AF042345 for example

#############################
# run genscan

$factory{genscan} = Pise::genscan->new($cgi{genscan}, $email, 
				       parameter_file => "HumanIso.smat",
				       seq => $seq);
$job{genscan} = $factory{genscan}->run;
if ($job{genscan}->error) {
    print $job{genscan}->error_message;
    exit;
}

#############################
# parse genscan result

my $parser = Bio::Tools::Genscan->new(
	-fh => $job{genscan}->fh('genscan.out')
	);
my @prots;
while(my $gene = $parser->next_prediction()) {
    push (@prots, $gene->predicted_protein);
}

#############################
# analyses of predicted protein(s)

$factory{coils2} = Pise::coils2->new($cgi{coils2}, $email);
$factory{saps} = Pise::saps->new($cgi{saps}, $email);
$factory{iep} = Pise::iep->new($cgi{iep}, $email);

foreach my $prot (@prots) {
    # running 3 analyses

    # coils2
    $factory{coils2}->query($prot);			# parameter setting
    $job{coils2,$prot->id} = $factory{coils2}->run;

    # saps
    $factory{saps}->seq($prot);		
    $job{saps,$prot->id} = $factory{saps}->run;

    # eip
    $factory{iep}->sequencea($prot);
    $job{iep,$prot->id} = $factory{iep}->run;
}

#############################
# print results

foreach my $jobname (keys %job) {
    print STDERR "$jobname: ", $job{$jobname}->jobid, "\n";
    print "$jobname:\n", $job{$jobname}->stdout, "\n";
}

##########################################################

Other examples (in the Examples sub-directory) show how to pipe
programs (dnadist.pl), to run a distributed computation for very long
phylogeny jobs (parallel.pl), to use EMBOSS programs (water.pl), to
play with program parameters (play_with_params.pl), to get a running
job (getjob.pl), etc... 

I would be very pleased now to get feedback from the bioperl community
and to know whether these classes could be useful (as these classes
deal with job running and output, they could maybe fit in
Bio::Tools::Run::Pise...? but for now, they are named outside any
bioperl module). 

The Pise/bioperl files are available at:
ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/PiseBioperl.tar.gz

I would also like to thank Leonardo Marino, David Bauer, Bela Tiwari, Bob
Friedman and "momo" for their encouraging feedback!

-- 
Catherine Letondal -- Pasteur Institute Computing Center