[Bioperl-l] Pise/bioperl
Catherine Letondal
gensoft@pasteur.fr
Fri, 30 Nov 2001 15:59:53 +0100
Dear bioperlers,
I have developped bioperl modules for Pise (Pise is an interface generator,
http://www-alt.pasteur.fr/~letondal/Pise). The idea is to use the
information about ~ 300 programs (phylogeny, pattern
discovery, databases searches, alignment, protein/DNA/RNA analysis,
structure prediction, EMBOSS...) that is already available in the Pise XML
definitions, to generate bioperl classes for running jobs on a Pise
server and getting results:
my $tacg = Pise::tacg->new ($cgi, $email,
sequence => $seq);
my $job = $tacg->run;
print $job->stdout;
To use these modules, it's not necessary to install Pise (i.e a Pise
server), since they just provide a client.
There are 3 main classes:
Pise PiseJob PiseJobParser
---- ------- -------------
factory --> actual job (Perl/SAX) to parse
to set (methods to XHTML server output
parameters actually submit
and launch the http request
jobs via LWP::UserAgent,
get job state
| and results)
|
+------+
| Pise |
+------+---------------------------------------+
| toppred fasta hmmbuild nnssp blast2 clustalw |
| genscan tacg dnapars wublast2 ... |
| |
+----------------------------------------------+
The specific programs' classes (e.g Pise::toppred.pm, Pise::fasta.pm,
Pise::hmmbuild.pm, Pise::nnssp.pm, ...), which are automatically
generated, are subclasses of the Pise class and provide informations
about the program parameters.
PiseJob instances are created either by the Pise subclass, or directly
by the user from the url of an already running job (a long job for
instance). When submitting a job, you can either wait for completion
($program->run) or manage the wait loop yourself ($program->submit and
$program->terminated), thus enabling parallel jobs (see
Examples/parallel.pl).
The following example shows the main features of Pise/bioperl:
- running remote jobs and fetching results through an object
- interface with bioperl: Bio::Seq objects as input parameter and
job output given as filehandle to bioperl parsers (sequence
and data input can also be provided as a file, a string, or a
filehandle - which enables to pipe programs)
- idea of factory (similar to Bio::Tools::Run::StandAloneBlast)
- distributed analyses: since Pise is installed on several
servers (and since the maintainers gave me the permission to
give this example - thanks! :-) )
##########################################################
# Example
use Pise::genscan;
use Pise::coils2;
use Pise::saps;
use Pise::iep;
use Bio::DB::GenBank;
use Bio::Tools::Genscan;
my %job;
my %factory;
#############################
# general pise configuration
my %cgi = ('genscan' => 'http://bioweb.pasteur.fr/cgi-bin/seqanal/genscan.pl',
'coils2' => 'http://tofu.tamu.edu/cgi-bin/seqanal/coils2.pl',
'saps' => 'http://bioweb.pasteur.fr/cgi-bin/seqanal/saps.pl',
'iep' => 'http://ubigcg.mdh4.mdc-berlin.de:8080/cgi-bin/Pise/iep.pl'
);
my $email = "" ; #------- Your email --------#
#############################
# fetch sequence
my $gb = new Bio::DB::GenBank;
my $seq = $gb->get_Seq_by_acc($ARGV[0]); # try with AF042345 for example
#############################
# run genscan
$factory{genscan} = Pise::genscan->new($cgi{genscan}, $email,
parameter_file => "HumanIso.smat",
seq => $seq);
$job{genscan} = $factory{genscan}->run;
if ($job{genscan}->error) {
print $job{genscan}->error_message;
exit;
}
#############################
# parse genscan result
my $parser = Bio::Tools::Genscan->new(
-fh => $job{genscan}->fh('genscan.out')
);
my @prots;
while(my $gene = $parser->next_prediction()) {
push (@prots, $gene->predicted_protein);
}
#############################
# analyses of predicted protein(s)
$factory{coils2} = Pise::coils2->new($cgi{coils2}, $email);
$factory{saps} = Pise::saps->new($cgi{saps}, $email);
$factory{iep} = Pise::iep->new($cgi{iep}, $email);
foreach my $prot (@prots) {
# running 3 analyses
# coils2
$factory{coils2}->query($prot); # parameter setting
$job{coils2,$prot->id} = $factory{coils2}->run;
# saps
$factory{saps}->seq($prot);
$job{saps,$prot->id} = $factory{saps}->run;
# eip
$factory{iep}->sequencea($prot);
$job{iep,$prot->id} = $factory{iep}->run;
}
#############################
# print results
foreach my $jobname (keys %job) {
print STDERR "$jobname: ", $job{$jobname}->jobid, "\n";
print "$jobname:\n", $job{$jobname}->stdout, "\n";
}
##########################################################
Other examples (in the Examples sub-directory) show how to pipe
programs (dnadist.pl), to run a distributed computation for very long
phylogeny jobs (parallel.pl), to use EMBOSS programs (water.pl), to
play with program parameters (play_with_params.pl), to get a running
job (getjob.pl), etc...
I would be very pleased now to get feedback from the bioperl community
and to know whether these classes could be useful (as these classes
deal with job running and output, they could maybe fit in
Bio::Tools::Run::Pise...? but for now, they are named outside any
bioperl module).
The Pise/bioperl files are available at:
ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/PiseBioperl.tar.gz
I would also like to thank Leonardo Marino, David Bauer, Bela Tiwari, Bob
Friedman and "momo" for their encouraging feedback!
--
Catherine Letondal -- Pasteur Institute Computing Center