[DAS] Ensembl via SOAP

Tony Cox avc@sanger.ac.uk
Wed, 5 Jun 2002 17:55:51 +0100 (BST)


On Wed, 5 Jun 2002, Lincoln Stein wrote:

I have just got this running under mod_perl and with gzip compression on the
wire for requests/responses > 10kb. It is _much_ faster and located at:

proxy    =>   'http://services.ensembl.org:7070/soap/ensembl_soaprouter',

Tony


+>My experience with SOAP::Lite has been that it is non-trivial to work 
+>backwards from the Perl object model to the XSL.  I ended up writing an 
+>axis-equivalent in Perl so that I could watch the objects on the wire and 
+>reverse-engineering it from there.
+>
+>Lincoln
+>
+>On Wednesday 05 June 2002 08:34, Brian wrote:
+>> Yeah,
+>>
+>> 	We have lot's of experience writing serializers and
+>> deserialiaers...If you need exaples you can look at OmniGene's omnitide
+>> package. There are about 20 (d/s)erializers checked in.
+>>
+>> 	as an aside we had the crazy idea of writing serializers for
+>> biojava objects but looked at the amount of work involved and thought that
+>> we'd have more support to do this.
+>>
+>> 	Is anyone else interested in helping write sers/desers for
+>> biojava/bioperl objects??
+>>
+>> 	I think we'd need to talk about the object model and then write
+>> the XSD's. From their we could use castor or jaxb or axis (I'd rather do
+>> axis) to get the objects flowing back and forth over the wire...Let me
+>> know...We are very interested in doing this with a partner.
+>>
+>> 				Best,
+>>
+>> 					-B
+>>
+>>  On Wed, 5 Jun 2002, Tony Cox wrote:
+>> > On Wed, 5 Jun 2002, Brian Gilman wrote:
+>> >
+>> > +>This is great!!
+>> > +>
+>> > +>	Do you need help with a java implementation?? I'd be willing to
+>> > +>help you out in a week or so...
+>> >
+>> > Hi Brian,
+>> >
+>> > Help would be welcome - as I mentioned below I got a java client working
+>> > but I fell over on having to write an object deserializer. Hopefully you
+>> > could pick it up there...?
+>> >
+>> > Tony
+>> >
+>> >
+>> > +>
+>> > +>			-B
+>> > +>
+>> > +>-----------------------
+>> > +>Brian Gilman <gilmanb@genome.wi.mit.edu>
+>> > +>Group Leader Medical & Population Genetics Dept.
+>> > +>MIT/Whitehead Inst. Center for Genome Research
+>> > +>One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
+>> > +>phone +1 617  252 1069 / fax +1 617 252 1902
+>> > +>
+>> > +>
+>> > +>On Wed, 5 Jun 2002, Tony Cox wrote:
+>> > +>
+>> > +>>
+>> > +>> I've been playing over the weekend with SOAP access to Ensembl
+>> > objects. I have a +>> test server running that can handle queries.
+>> > +>>
+>> > +>>
+>> > +>> Nutshell:
+>> > +>> =========
+>> > +>>
+>> > +>>
+>> > +>> use SOAP::Lite +autodispatch =>
+>> > +>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
+>> > +>>    proxy    =>  
+>> > 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter'; +>>
+>> > +>>    my $trans =
+>> > +>>
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>'ENST0000022
+>> >5283'); +>>    print $trans->seq(), "\n";
+>> > +>>    print $trans->translate(), "\n";
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>> It is a pretty niave implementation but allows fairly reasonable
+>> > access to +>> Ensembl objects via RPC. A remote "proxy" class takes care
+>> > of creating and +>> manipulating ensembl objects on the server side and
+>> > allows you to make direct +>> calls locally. General id/start/end type
+>> > calls all work. Since ensembl objects +>> are all intimately tied to DB
+>> > connections which all goes horribly wrong over +>> SOAP the proxy object
+>> > takes care creating these connections as necessary on the +>> server.
+>> > Calls that returns an object have been changed to return an ID - which
+>> > +>> can be used to create a remote object. I've only done the main stuff
+>> > - features, +>> SNPs etc, are missing
+>> > +>>
+>> > +>> The good thing is that you don't need any local databases or even
+>> > ensembl code, +>> just a working copy of SOAP::Lite from CPAN. The bad
+>> > thing is that it is pretty +>> __slow__ at the moment. The server is not
+>> > running under mod_perl so most of the +>> response time is taken up in
+>> > module compilation and XML transport. I'll try to +>> get it running
+>> > under mod_perl and with transport compression enabled. +>>
+>> > +>> I don't see this as an interface of choice for the bioinformatician!
+>> > - it is too +>> slow and anyway they will have the "real" ensembl code to
+>> > turn to. This is much +>> more of a lightweight interface for
+>> > conveniently fetching sequences, genes etc +>> where speed is not a
+>> > critical issue, and the convenience of a simple programming +>> interface
+>> > is the important factor.
+>> > +>>
+>> > +>> I'd be very interested to see interoperability tested. I did write a
+>> > very small +>> java client to make requests but rapidly got out of my
+>> > depth when having to +>> write a deserializer for the remote object.
+>> > After looking into the Omnigene code +>> I see how these work but I'm
+>> > rather hoping that somebody on the omnigene team +>> might have a go at
+>> > doing this.
+>> > +>>
+>> > +>> Following is a simple script that provides examples of manipulating
+>> > remote +>> objects. You "get" a remote object on the server be creating a
+>> > new +>> Bio::EnsEMBL::Remote::Object and giving a it a type and ID. At
+>> > the moment you +>> can only fetch virtualcontigs, genes, transcripts,
+>> > exons, clones, contigs and +>> translations (peptides). By the magic of
+>> > "autodispatch", if you get a "thingy" +>> back, you can just treat it as
+>> > a normal object and make calls on it. Perl's +>> autoloader will try and
+>> > satisfy calls that are not overloaded in the remote +>> object (I know
+>> > this sucks). If they are simple get/property calls they will +>> probably
+>> > work - if the call returns an object/objects, bad things will probably
+>> > +>> happen. Trying to write to the object may work (I havn't tried it)
+>> > but is likely +>> not to be a useful thing to do! Remember this is a
+>> > transaction-type system where +>> all the responses need to be marshalled
+>> > before transport takes place so it will +>> not "stream" data to you as
+>> > if it were a socket-style connection. +>>
+>> > +>> In the event of an error, you usually end up with undef (the code is
+>> > pretty raw +>> at the moment). If you really want you can track down
+>> > errors, use the following +>> block:
+>> > +>>
+>> > +>>    if(SOAP::Lite->self->call->fault) {
+>> > +>>         print "Fault code: ", SOAP::Lite->self->call->faultcode,
+>> > "\n"; +>>         print "Fault string: ",
+>> > SOAP::Lite->self->call->faultstring, "\n"; +>>         print "Fault
+>> > detail: ", SOAP::Lite->self->call->faultdetail, "\n"; +>>         print
+>> > "Fault actor: ", SOAP::Lite->self->call->faultactor, "\n"; +>>        
+>> > exit;
+>> > +>>    }
+>> > +>>
+>> > +>>
+>> > +>> comments and suggestions welcome,
+>> > +>>
+>> > +>> cheers
+>> > +>>
+>> > +>> Tony
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>> To try the server out enable one or more of the following blocks:
+>> > +>>
+>> > +>>
+>> > +>> #!/usr/local/bin/perl
+>> > +>>
+>> > +>> package MySoapClient;
+>> > +>>
+>> > +>> use strict;
+>> > +>> use SOAP::Lite +autodispatch =>
+>> > +>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
+>> > +>>    proxy    =>  
+>> > 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter'; +>>
+>> > +>> if(1){
+>> > +>>     my @g = (qw(ENSG00000131591 BRCA1));
+>> > +>>     foreach my $g (@g){
+>> > +>>         print "Getting gene: $g...\n";
+>> > +>>         $g =
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g); +>>        
+>> > print "\tGene ID: ", $g->id(), "\n";
+>> > +>>         foreach my $t ($g->transcripts()){
+>> > +>>             my $t =
+>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
+>> > +>>             print "\t\tTranscript ID: ", $t->id(), "\n";
+>> > +>>             print "\t\tTranscript length: ", $t->length(), "\n";
+>> > +>>             #print "\t\tTranscript seq: ", $t->seq(), "\n";
+>> > +>>             print "\t\tTranscript protein: ", $t->translate(), "\n";
+>> > +>>         }
+>> > +>>     }
+>> > +>> }
+>> > +>>
+>> > +>> if(0){
+>> > +>>     print "Getting remote clone AP000869...\n";
+>> > +>>     my $cl =
+>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'clone','id'=>'AP000869');
+>> > +>>     print "Clone: ", $cl->embl_id(), "\n";
+>> > +>>     print "Version: ", $cl->version(), "\n";
+>> > +>>
+>> > +>>     foreach my $c ($cl->contigs()){
+>> > +>>         my $c =
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'contig','id'=>$c); +>>        
+>> > my $id = $c->id();
+>> > +>>         if($c->is_static_golden()){
+>> > +>>             print "\tContig ID: $id (golden)\n";
+>> > +>>             print "\tContig length: ", $c->length(), "\n";
+>> > +>>             print "\tContig is golden?: yes\n";
+>> > +>>             print "\t\tContig global start: ",
+>> > $c->static_golden_start(), "\n"; +>>             print "\t\tContig global
+>> > end: ",   $c->static_golden_end(), "\n"; +>>             print
+>> > "\t\tContig global ori: ",   $c->static_golden_ori(), "\n"; +>>          
+>> >   #print "\tContig seq: ", $c->seq(), "\n";
+>> > +>>         } else {
+>> > +>>             print "\tContig ID: $id (non-golden)\n";
+>> > +>>         }
+>> > +>>     }
+>> > +>> }
+>> > +>>
+>> > +>>
+>> > +>> if(0){
+>> > +>>     my $chr = 1;
+>> > +>>     my $start = 100000;
+>> > +>>     my $end = 200000;
+>> > +>>     print "Getting remote virtualcontig for $chr, $start-$end...\n";
+>> > +>>     my $v =
+>> > +>>
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'virtualcontig','chr'=>$chr,
+>> > +>> 'start'=>$start, 'end'=>$end);
+>> > +>>     print "Virtual contig ID: ", $v->id(), "\n";
+>> > +>>     print "Virtual contig length: ", $v->length(), "\n";
+>> > +>>     print "Virtual contig chromosome: ", $v->_chr_name(), "\n";
+>> > +>>     print "Virtual contig chromosome length: ",
+>> > $v->fetch_chromosome_length(), +>> "\n";
+>> > +>>
+>> > +>>     foreach my $g ($v->genes()){
+>> > +>>         $g =
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g); +>>        
+>> > print "\tGene ID: ", $g->id(), "\n";
+>> > +>>         foreach my $t ($g->transcripts()){
+>> > +>>             my $t =
+>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
+>> > +>>             print "\t\tTranscript ID: ", $t->id(), "\n";
+>> > +>>             print "\t\tTranscript length: ", $t->length(), "\n";
+>> > +>>             #print "\t\tTranscript seq: ", $t->seq(), "\n";
+>> > +>>             #print "\t\tTranscript protein: ", $t->translate(), "\n";
+>> > +>>             foreach my $e ($t->exons()){
+>> > +>>                 my $e =
+>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'exon','id'=>$e);
+>> > +>>                 print "\t\t\tExon ID: ", $e->id(), "\n";
+>> > +>>                 print "\t\t\tExon start: ", $e->ori_start(), "\n";
+>> > +>>                 print "\t\t\tExon end: ", $e->ori_end(), "\n";
+>> > +>>                 print "\t\t\tExon strand: ", $e->strand(), "\n";
+>> > +>>                 print "\t\t\tExon seq: ", $e->seq(), "\n";
+>> > +>>            }
+>> > +>>         }
+>> > +>>
+>> > +>>     }
+>> > +>> }
+>> > +>>
+>> > +>> if(0){
+>> > +>>     my $p = "ENSP00000223439";
+>> > +>>     print "Getting remote peptide $p...\n";
+>> > +>>     my $p =
+>> > Bio::EnsEMBL::Remote::Object->new('type'=>'translation','id'=>$p); +>>   
+>> >  print $p->seq();
+>> > +>> }
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>>
+>> > +>> ******************************************************
+>> > +>> Tony Cox			Email:avc@sanger.ac.uk
+>> > +>> Sanger Institute		WWW:www.sanger.ac.uk
+>> > +>> Wellcome Trust Genome Campus	Webmaster
+>> > +>> Hinxton				Tel: +44 1223 834244
+>> > +>> Cambs. CB10 1SA			Fax: +44 1223 494919
+>> > +>> ******************************************************
+>> > +>>
+>> > +>> _______________________________________________
+>> > +>> DAS mailing list
+>> > +>> DAS@biodas.org
+>> > +>> http://biodas.org/mailman/listinfo/das
+>> > +>>
+>> > +>
+>> >
+>> > ******************************************************
+>> > Tony Cox			Email:avc@sanger.ac.uk
+>> > Sanger Institute		WWW:www.sanger.ac.uk
+>> > Wellcome Trust Genome Campus	Webmaster
+>> > Hinxton				Tel: +44 1223 834244
+>> > Cambs. CB10 1SA			Fax: +44 1223 494919
+>> > ******************************************************
+>> >
+>> > _______________________________________________
+>> > DAS mailing list
+>> > DAS@biodas.org
+>> > http://biodas.org/mailman/listinfo/das
+>
+>-- 
+>========================================================================
+>Lincoln D. Stein                           Cold Spring Harbor Laboratory
+>lstein@cshl.org			                  Cold Spring Harbor, NY
+>========================================================================
+>

******************************************************
Tony Cox			Email:avc@sanger.ac.uk
Sanger Institute		WWW:www.sanger.ac.uk
Wellcome Trust Genome Campus	Webmaster
Hinxton				Tel: +44 1223 834244
Cambs. CB10 1SA			Fax: +44 1223 494919
******************************************************