[DAS] Ensembl via SOAP

Tony Cox avc@sanger.ac.uk
Wed, 5 Jun 2002 13:26:25 +0100 (BST)


On Wed, 5 Jun 2002, Brian Gilman wrote:

+>This is great!!
+>
+>	Do you need help with a java implementation?? I'd be willing to
+>help you out in a week or so...

Hi Brian,

Help would be welcome - as I mentioned below I got a java client working but I
fell over on having to write an object deserializer. Hopefully you could pick it
up there...?

Tony


+>
+>			-B
+>
+>-----------------------
+>Brian Gilman <gilmanb@genome.wi.mit.edu>
+>Group Leader Medical & Population Genetics Dept.
+>MIT/Whitehead Inst. Center for Genome Research
+>One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
+>phone +1 617  252 1069 / fax +1 617 252 1902
+>
+>
+>On Wed, 5 Jun 2002, Tony Cox wrote:
+>
+>> 
+>> I've been playing over the weekend with SOAP access to Ensembl objects. I have a
+>> test server running that can handle queries.
+>> 
+>> 
+>> Nutshell:
+>> =========
+>> 
+>> 
+>> use SOAP::Lite +autodispatch =>
+>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
+>>    proxy    =>   'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
+>> 
+>>    my $trans =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>'ENST00000225283');
+>>    print $trans->seq(), "\n";
+>>    print $trans->translate(), "\n";
+>> 
+>> 
+>> 
+>> It is a pretty niave implementation but allows fairly reasonable access to
+>> Ensembl objects via RPC. A remote "proxy" class takes care of creating and
+>> manipulating ensembl objects on the server side and allows you to make direct
+>> calls locally. General id/start/end type calls all work. Since ensembl objects
+>> are all intimately tied to DB connections which all goes horribly wrong over
+>> SOAP the proxy object takes care creating these connections as necessary on the
+>> server. Calls that returns an object have been changed to return an ID - which
+>> can be used to create a remote object. I've only done the main stuff - features,
+>> SNPs etc, are missing
+>> 
+>> The good thing is that you don't need any local databases or even ensembl code,
+>> just a working copy of SOAP::Lite from CPAN. The bad thing is that it is pretty
+>> __slow__ at the moment. The server is not running under mod_perl so most of the
+>> response time is taken up in module compilation and XML transport. I'll try to
+>> get it running under mod_perl and with transport compression enabled. 
+>> 
+>> I don't see this as an interface of choice for the bioinformatician! - it is too
+>> slow and anyway they will have the "real" ensembl code to turn to. This is much
+>> more of a lightweight interface for conveniently fetching sequences, genes etc
+>> where speed is not a critical issue, and the convenience of a simple programming
+>> interface is the important factor.
+>> 
+>> I'd be very interested to see interoperability tested. I did write a very small
+>> java client to make requests but rapidly got out of my depth when having to
+>> write a deserializer for the remote object. After looking into the Omnigene code
+>> I see how these work but I'm rather hoping that somebody on the omnigene team
+>> might have a go at doing this.
+>> 
+>> Following is a simple script that provides examples of manipulating remote
+>> objects. You "get" a remote object on the server be creating a new
+>> Bio::EnsEMBL::Remote::Object and giving a it a type and ID. At the moment you
+>> can only fetch virtualcontigs, genes, transcripts, exons, clones, contigs and
+>> translations (peptides). By the magic of "autodispatch", if you get a "thingy"
+>> back, you can just treat it as a normal object and make calls on it. Perl's
+>> autoloader will try and satisfy calls that are not overloaded in the remote
+>> object (I know this sucks). If they are simple get/property calls they will
+>> probably work - if the call returns an object/objects, bad things will probably
+>> happen. Trying to write to the object may work (I havn't tried it) but is likely
+>> not to be a useful thing to do! Remember this is a transaction-type system where
+>> all the responses need to be marshalled before transport takes place so it will
+>> not "stream" data to you as if it were a socket-style connection.
+>> 
+>> In the event of an error, you usually end up with undef (the code is pretty raw
+>> at the moment). If you really want you can track down errors, use the following
+>> block:
+>> 
+>>    if(SOAP::Lite->self->call->fault) {
+>>         print "Fault code: ", SOAP::Lite->self->call->faultcode, "\n";
+>>         print "Fault string: ", SOAP::Lite->self->call->faultstring, "\n";
+>>         print "Fault detail: ", SOAP::Lite->self->call->faultdetail, "\n";
+>>         print "Fault actor: ", SOAP::Lite->self->call->faultactor, "\n";
+>>         exit;
+>>    }
+>> 
+>> 
+>> comments and suggestions welcome,
+>> 
+>> cheers
+>> 
+>> Tony
+>> 
+>> 
+>> 
+>> 
+>> 
+>> 
+>> To try the server out enable one or more of the following blocks:
+>> 
+>> 
+>> #!/usr/local/bin/perl
+>> 
+>> package MySoapClient;
+>> 
+>> use strict;
+>> use SOAP::Lite +autodispatch =>
+>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
+>>    proxy    =>   'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
+>>    
+>> if(1){
+>>     my @g = (qw(ENSG00000131591 BRCA1));
+>>     foreach my $g (@g){
+>>         print "Getting gene: $g...\n";
+>>         $g = Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
+>>         print "\tGene ID: ", $g->id(), "\n";
+>>         foreach my $t ($g->transcripts()){
+>>             my $t =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
+>>             print "\t\tTranscript ID: ", $t->id(), "\n";
+>>             print "\t\tTranscript length: ", $t->length(), "\n";
+>>             #print "\t\tTranscript seq: ", $t->seq(), "\n"; 
+>>             print "\t\tTranscript protein: ", $t->translate(), "\n";
+>>         }
+>>     }
+>> }
+>> 
+>> if(0){
+>>     print "Getting remote clone AP000869...\n"; 
+>>     my $cl =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'clone','id'=>'AP000869');
+>>     print "Clone: ", $cl->embl_id(), "\n";
+>>     print "Version: ", $cl->version(), "\n";
+>>     
+>>     foreach my $c ($cl->contigs()){
+>>         my $c = Bio::EnsEMBL::Remote::Object->new('type'=>'contig','id'=>$c);
+>>         my $id = $c->id();
+>>         if($c->is_static_golden()){
+>>             print "\tContig ID: $id (golden)\n";
+>>             print "\tContig length: ", $c->length(), "\n";
+>>             print "\tContig is golden?: yes\n";
+>>             print "\t\tContig global start: ", $c->static_golden_start(), "\n";
+>>             print "\t\tContig global end: ",   $c->static_golden_end(), "\n";
+>>             print "\t\tContig global ori: ",   $c->static_golden_ori(), "\n";
+>>             #print "\tContig seq: ", $c->seq(), "\n"; 
+>>         } else {
+>>             print "\tContig ID: $id (non-golden)\n";
+>>         }
+>>     }
+>> }
+>> 
+>> 
+>> if(0){
+>>     my $chr = 1;
+>>     my $start = 100000;
+>>     my $end = 200000;
+>>     print "Getting remote virtualcontig for $chr, $start-$end...\n"; 
+>>     my $v =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'virtualcontig','chr'=>$chr,
+>> 'start'=>$start, 'end'=>$end);
+>>     print "Virtual contig ID: ", $v->id(), "\n"; 
+>>     print "Virtual contig length: ", $v->length(), "\n"; 
+>>     print "Virtual contig chromosome: ", $v->_chr_name(), "\n"; 
+>>     print "Virtual contig chromosome length: ", $v->fetch_chromosome_length(),
+>> "\n"; 
+>> 
+>>     foreach my $g ($v->genes()){
+>>         $g = Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
+>>         print "\tGene ID: ", $g->id(), "\n";
+>>         foreach my $t ($g->transcripts()){
+>>             my $t =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
+>>             print "\t\tTranscript ID: ", $t->id(), "\n";
+>>             print "\t\tTranscript length: ", $t->length(), "\n";
+>>             #print "\t\tTranscript seq: ", $t->seq(), "\n"; 
+>>             #print "\t\tTranscript protein: ", $t->translate(), "\n";
+>>             foreach my $e ($t->exons()){
+>>                 my $e =
+>> Bio::EnsEMBL::Remote::Object->new('type'=>'exon','id'=>$e);
+>>                 print "\t\t\tExon ID: ", $e->id(), "\n";
+>>                 print "\t\t\tExon start: ", $e->ori_start(), "\n";
+>>                 print "\t\t\tExon end: ", $e->ori_end(), "\n";
+>>                 print "\t\t\tExon strand: ", $e->strand(), "\n";
+>>                 print "\t\t\tExon seq: ", $e->seq(), "\n";
+>>            }
+>>         }
+>>     
+>>     }
+>> }
+>> 
+>> if(0){
+>>     my $p = "ENSP00000223439";
+>>     print "Getting remote peptide $p...\n"; 
+>>     my $p = Bio::EnsEMBL::Remote::Object->new('type'=>'translation','id'=>$p);
+>>     print $p->seq();
+>> }
+>> 
+>> 
+>> 
+>> 
+>> ******************************************************
+>> Tony Cox			Email:avc@sanger.ac.uk
+>> Sanger Institute		WWW:www.sanger.ac.uk
+>> Wellcome Trust Genome Campus	Webmaster
+>> Hinxton				Tel: +44 1223 834244
+>> Cambs. CB10 1SA			Fax: +44 1223 494919
+>> ******************************************************
+>> 
+>> _______________________________________________
+>> DAS mailing list
+>> DAS@biodas.org
+>> http://biodas.org/mailman/listinfo/das
+>> 
+>

******************************************************
Tony Cox			Email:avc@sanger.ac.uk
Sanger Institute		WWW:www.sanger.ac.uk
Wellcome Trust Genome Campus	Webmaster
Hinxton				Tel: +44 1223 834244
Cambs. CB10 1SA			Fax: +44 1223 494919
******************************************************