[DAS] Ensembl via SOAP
Tony Cox
avc@sanger.ac.uk
Wed, 5 Jun 2002 12:33:42 +0100 (BST)
I've been playing over the weekend with SOAP access to Ensembl objects. I have a
test server running that can handle queries.
Nutshell:
=========
use SOAP::Lite +autodispatch =>
uri => 'Bio::EnsEMBL::Remote::Object',
proxy => 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
my $trans =
Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>'ENST00000225283');
print $trans->seq(), "\n";
print $trans->translate(), "\n";
It is a pretty niave implementation but allows fairly reasonable access to
Ensembl objects via RPC. A remote "proxy" class takes care of creating and
manipulating ensembl objects on the server side and allows you to make direct
calls locally. General id/start/end type calls all work. Since ensembl objects
are all intimately tied to DB connections which all goes horribly wrong over
SOAP the proxy object takes care creating these connections as necessary on the
server. Calls that returns an object have been changed to return an ID - which
can be used to create a remote object. I've only done the main stuff - features,
SNPs etc, are missing
The good thing is that you don't need any local databases or even ensembl code,
just a working copy of SOAP::Lite from CPAN. The bad thing is that it is pretty
__slow__ at the moment. The server is not running under mod_perl so most of the
response time is taken up in module compilation and XML transport. I'll try to
get it running under mod_perl and with transport compression enabled.
I don't see this as an interface of choice for the bioinformatician! - it is too
slow and anyway they will have the "real" ensembl code to turn to. This is much
more of a lightweight interface for conveniently fetching sequences, genes etc
where speed is not a critical issue, and the convenience of a simple programming
interface is the important factor.
I'd be very interested to see interoperability tested. I did write a very small
java client to make requests but rapidly got out of my depth when having to
write a deserializer for the remote object. After looking into the Omnigene code
I see how these work but I'm rather hoping that somebody on the omnigene team
might have a go at doing this.
Following is a simple script that provides examples of manipulating remote
objects. You "get" a remote object on the server be creating a new
Bio::EnsEMBL::Remote::Object and giving a it a type and ID. At the moment you
can only fetch virtualcontigs, genes, transcripts, exons, clones, contigs and
translations (peptides). By the magic of "autodispatch", if you get a "thingy"
back, you can just treat it as a normal object and make calls on it. Perl's
autoloader will try and satisfy calls that are not overloaded in the remote
object (I know this sucks). If they are simple get/property calls they will
probably work - if the call returns an object/objects, bad things will probably
happen. Trying to write to the object may work (I havn't tried it) but is likely
not to be a useful thing to do! Remember this is a transaction-type system where
all the responses need to be marshalled before transport takes place so it will
not "stream" data to you as if it were a socket-style connection.
In the event of an error, you usually end up with undef (the code is pretty raw
at the moment). If you really want you can track down errors, use the following
block:
if(SOAP::Lite->self->call->fault) {
print "Fault code: ", SOAP::Lite->self->call->faultcode, "\n";
print "Fault string: ", SOAP::Lite->self->call->faultstring, "\n";
print "Fault detail: ", SOAP::Lite->self->call->faultdetail, "\n";
print "Fault actor: ", SOAP::Lite->self->call->faultactor, "\n";
exit;
}
comments and suggestions welcome,
cheers
Tony
To try the server out enable one or more of the following blocks:
#!/usr/local/bin/perl
package MySoapClient;
use strict;
use SOAP::Lite +autodispatch =>
uri => 'Bio::EnsEMBL::Remote::Object',
proxy => 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
if(1){
my @g = (qw(ENSG00000131591 BRCA1));
foreach my $g (@g){
print "Getting gene: $g...\n";
$g = Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
print "\tGene ID: ", $g->id(), "\n";
foreach my $t ($g->transcripts()){
my $t =
Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
print "\t\tTranscript ID: ", $t->id(), "\n";
print "\t\tTranscript length: ", $t->length(), "\n";
#print "\t\tTranscript seq: ", $t->seq(), "\n";
print "\t\tTranscript protein: ", $t->translate(), "\n";
}
}
}
if(0){
print "Getting remote clone AP000869...\n";
my $cl =
Bio::EnsEMBL::Remote::Object->new('type'=>'clone','id'=>'AP000869');
print "Clone: ", $cl->embl_id(), "\n";
print "Version: ", $cl->version(), "\n";
foreach my $c ($cl->contigs()){
my $c = Bio::EnsEMBL::Remote::Object->new('type'=>'contig','id'=>$c);
my $id = $c->id();
if($c->is_static_golden()){
print "\tContig ID: $id (golden)\n";
print "\tContig length: ", $c->length(), "\n";
print "\tContig is golden?: yes\n";
print "\t\tContig global start: ", $c->static_golden_start(), "\n";
print "\t\tContig global end: ", $c->static_golden_end(), "\n";
print "\t\tContig global ori: ", $c->static_golden_ori(), "\n";
#print "\tContig seq: ", $c->seq(), "\n";
} else {
print "\tContig ID: $id (non-golden)\n";
}
}
}
if(0){
my $chr = 1;
my $start = 100000;
my $end = 200000;
print "Getting remote virtualcontig for $chr, $start-$end...\n";
my $v =
Bio::EnsEMBL::Remote::Object->new('type'=>'virtualcontig','chr'=>$chr,
'start'=>$start, 'end'=>$end);
print "Virtual contig ID: ", $v->id(), "\n";
print "Virtual contig length: ", $v->length(), "\n";
print "Virtual contig chromosome: ", $v->_chr_name(), "\n";
print "Virtual contig chromosome length: ", $v->fetch_chromosome_length(),
"\n";
foreach my $g ($v->genes()){
$g = Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
print "\tGene ID: ", $g->id(), "\n";
foreach my $t ($g->transcripts()){
my $t =
Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
print "\t\tTranscript ID: ", $t->id(), "\n";
print "\t\tTranscript length: ", $t->length(), "\n";
#print "\t\tTranscript seq: ", $t->seq(), "\n";
#print "\t\tTranscript protein: ", $t->translate(), "\n";
foreach my $e ($t->exons()){
my $e =
Bio::EnsEMBL::Remote::Object->new('type'=>'exon','id'=>$e);
print "\t\t\tExon ID: ", $e->id(), "\n";
print "\t\t\tExon start: ", $e->ori_start(), "\n";
print "\t\t\tExon end: ", $e->ori_end(), "\n";
print "\t\t\tExon strand: ", $e->strand(), "\n";
print "\t\t\tExon seq: ", $e->seq(), "\n";
}
}
}
}
if(0){
my $p = "ENSP00000223439";
print "Getting remote peptide $p...\n";
my $p = Bio::EnsEMBL::Remote::Object->new('type'=>'translation','id'=>$p);
print $p->seq();
}
******************************************************
Tony Cox Email:avc@sanger.ac.uk
Sanger Institute WWW:www.sanger.ac.uk
Wellcome Trust Genome Campus Webmaster
Hinxton Tel: +44 1223 834244
Cambs. CB10 1SA Fax: +44 1223 494919
******************************************************