[DAS] Ensembl via SOAP
Lincoln Stein
lstein@cshl.org
Wed, 5 Jun 2002 13:37:50 -0400
I'm pretty happy with the performance of SOAP::LIte under mod_soap, although
it is still an order of magnitude slower than just fetching objects directly.
I have a Perl-based server and C-based client talking to each other quite
happily.
Lincoln
On Wednesday 05 June 2002 12:55, Tony Cox wrote:
> On Wed, 5 Jun 2002, Lincoln Stein wrote:
>
> I have just got this running under mod_perl and with gzip compression on
> the wire for requests/responses > 10kb. It is _much_ faster and located at:
>
> proxy => 'http://services.ensembl.org:7070/soap/ensembl_soaprouter',
>
> Tony
>
>
> +>My experience with SOAP::Lite has been that it is non-trivial to work
> +>backwards from the Perl object model to the XSL. I ended up writing an
> +>axis-equivalent in Perl so that I could watch the objects on the wire and
> +>reverse-engineering it from there.
> +>
> +>Lincoln
> +>
> +>On Wednesday 05 June 2002 08:34, Brian wrote:
> +>> Yeah,
> +>>
> +>> We have lot's of experience writing serializers and
> +>> deserialiaers...If you need exaples you can look at OmniGene's omnitide
> +>> package. There are about 20 (d/s)erializers checked in.
> +>>
> +>> as an aside we had the crazy idea of writing serializers for
> +>> biojava objects but looked at the amount of work involved and thought
> that +>> we'd have more support to do this.
> +>>
> +>> Is anyone else interested in helping write sers/desers for
> +>> biojava/bioperl objects??
> +>>
> +>> I think we'd need to talk about the object model and then write
> +>> the XSD's. From their we could use castor or jaxb or axis (I'd rather
> do +>> axis) to get the objects flowing back and forth over the wire...Let
> me +>> know...We are very interested in doing this with a partner.
> +>>
> +>> Best,
> +>>
> +>> -B
> +>>
> +>> On Wed, 5 Jun 2002, Tony Cox wrote:
> +>> > On Wed, 5 Jun 2002, Brian Gilman wrote:
> +>> >
> +>> > +>This is great!!
> +>> > +>
> +>> > +> Do you need help with a java implementation?? I'd be willing to
> +>> > +>help you out in a week or so...
> +>> >
> +>> > Hi Brian,
> +>> >
> +>> > Help would be welcome - as I mentioned below I got a java client
> working +>> > but I fell over on having to write an object deserializer.
> Hopefully you +>> > could pick it up there...?
> +>> >
> +>> > Tony
> +>> >
> +>> >
> +>> > +>
> +>> > +> -B
> +>> > +>
> +>> > +>-----------------------
> +>> > +>Brian Gilman <gilmanb@genome.wi.mit.edu>
> +>> > +>Group Leader Medical & Population Genetics Dept.
> +>> > +>MIT/Whitehead Inst. Center for Genome Research
> +>> > +>One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> +>> > +>phone +1 617 252 1069 / fax +1 617 252 1902
> +>> > +>
> +>> > +>
> +>> > +>On Wed, 5 Jun 2002, Tony Cox wrote:
> +>> > +>
> +>> > +>>
> +>> > +>> I've been playing over the weekend with SOAP access to Ensembl
> +>> > objects. I have a +>> test server running that can handle queries.
> +>> > +>>
> +>> > +>>
> +>> > +>> Nutshell:
> +>> > +>> =========
> +>> > +>>
> +>> > +>>
> +>> > +>> use SOAP::Lite +autodispatch =>
> +>> > +>> uri => 'Bio::EnsEMBL::Remote::Object',
> +>> > +>> proxy =>
> +>> > 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter'; +>>
> +>> > +>> my $trans =
> +>> > +>>
> +>> >
> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>'ENST0000022
> +>> >5283'); +>> print $trans->seq(), "\n";
> +>> > +>> print $trans->translate(), "\n";
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>> It is a pretty niave implementation but allows fairly reasonable
> +>> > access to +>> Ensembl objects via RPC. A remote "proxy" class takes
> care +>> > of creating and +>> manipulating ensembl objects on the server
> side and +>> > allows you to make direct +>> calls locally. General
> id/start/end type +>> > calls all work. Since ensembl objects +>> are all
> intimately tied to DB +>> > connections which all goes horribly wrong over
> +>> SOAP the proxy object +>> > takes care creating these connections as
> necessary on the +>> server. +>> > Calls that returns an object have been
> changed to return an ID - which +>> > +>> can be used to create a remote
> object. I've only done the main stuff +>> > - features, +>> SNPs etc, are
> missing
> +>> > +>>
> +>> > +>> The good thing is that you don't need any local databases or even
> +>> > ensembl code, +>> just a working copy of SOAP::Lite from CPAN. The
> bad +>> > thing is that it is pretty +>> __slow__ at the moment. The server
> is not +>> > running under mod_perl so most of the +>> response time is
> taken up in +>> > module compilation and XML transport. I'll try to +>> get
> it running +>> > under mod_perl and with transport compression enabled. +>>
> +>> > +>> I don't see this as an interface of choice for the
> bioinformatician! +>> > - it is too +>> slow and anyway they will have the
> "real" ensembl code to +>> > turn to. This is much +>> more of a
> lightweight interface for +>> > conveniently fetching sequences, genes etc
> +>> where speed is not a +>> > critical issue, and the convenience of a
> simple programming +>> interface +>> > is the important factor.
> +>> > +>>
> +>> > +>> I'd be very interested to see interoperability tested. I did
> write a +>> > very small +>> java client to make requests but rapidly got
> out of my +>> > depth when having to +>> write a deserializer for the
> remote object. +>> > After looking into the Omnigene code +>> I see how
> these work but I'm +>> > rather hoping that somebody on the omnigene team
> +>> might have a go at +>> > doing this.
> +>> > +>>
> +>> > +>> Following is a simple script that provides examples of
> manipulating +>> > remote +>> objects. You "get" a remote object on the
> server be creating a +>> > new +>> Bio::EnsEMBL::Remote::Object and giving
> a it a type and ID. At +>> > the moment you +>> can only fetch
> virtualcontigs, genes, transcripts, +>> > exons, clones, contigs and +>>
> translations (peptides). By the magic of +>> > "autodispatch", if you get a
> "thingy" +>> back, you can just treat it as +>> > a normal object and make
> calls on it. Perl's +>> autoloader will try and +>> > satisfy calls that
> are not overloaded in the remote +>> object (I know +>> > this sucks). If
> they are simple get/property calls they will +>> probably +>> > work - if
> the call returns an object/objects, bad things will probably +>> > +>>
> happen. Trying to write to the object may work (I havn't tried it) +>> >
> but is likely +>> not to be a useful thing to do! Remember this is a +>> >
> transaction-type system where +>> all the responses need to be marshalled
> +>> > before transport takes place so it will +>> not "stream" data to you
> as +>> > if it were a socket-style connection. +>>
> +>> > +>> In the event of an error, you usually end up with undef (the code
> is +>> > pretty raw +>> at the moment). If you really want you can track
> down +>> > errors, use the following +>> block:
> +>> > +>>
> +>> > +>> if(SOAP::Lite->self->call->fault) {
> +>> > +>> print "Fault code: ", SOAP::Lite->self->call->faultcode,
> +>> > "\n"; +>> print "Fault string: ",
> +>> > SOAP::Lite->self->call->faultstring, "\n"; +>> print "Fault
> +>> > detail: ", SOAP::Lite->self->call->faultdetail, "\n"; +>>
> print +>> > "Fault actor: ", SOAP::Lite->self->call->faultactor, "\n"; +>>
> +>> > exit;
> +>> > +>> }
> +>> > +>>
> +>> > +>>
> +>> > +>> comments and suggestions welcome,
> +>> > +>>
> +>> > +>> cheers
> +>> > +>>
> +>> > +>> Tony
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>> To try the server out enable one or more of the following blocks:
> +>> > +>>
> +>> > +>>
> +>> > +>> #!/usr/local/bin/perl
> +>> > +>>
> +>> > +>> package MySoapClient;
> +>> > +>>
> +>> > +>> use strict;
> +>> > +>> use SOAP::Lite +autodispatch =>
> +>> > +>> uri => 'Bio::EnsEMBL::Remote::Object',
> +>> > +>> proxy =>
> +>> > 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter'; +>>
> +>> > +>> if(1){
> +>> > +>> my @g = (qw(ENSG00000131591 BRCA1));
> +>> > +>> foreach my $g (@g){
> +>> > +>> print "Getting gene: $g...\n";
> +>> > +>> $g =
> +>> > Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g); +>>
> +>> > print "\tGene ID: ", $g->id(), "\n";
> +>> > +>> foreach my $t ($g->transcripts()){
> +>> > +>> my $t =
> +>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
> +>> > +>> print "\t\tTranscript ID: ", $t->id(), "\n";
> +>> > +>> print "\t\tTranscript length: ", $t->length(), "\n";
> +>> > +>> #print "\t\tTranscript seq: ", $t->seq(), "\n";
> +>> > +>> print "\t\tTranscript protein: ", $t->translate(),
> "\n"; +>> > +>> }
> +>> > +>> }
> +>> > +>> }
> +>> > +>>
> +>> > +>> if(0){
> +>> > +>> print "Getting remote clone AP000869...\n";
> +>> > +>> my $cl =
> +>> > +>>
> Bio::EnsEMBL::Remote::Object->new('type'=>'clone','id'=>'AP000869'); +>> >
> +>> print "Clone: ", $cl->embl_id(), "\n";
> +>> > +>> print "Version: ", $cl->version(), "\n";
> +>> > +>>
> +>> > +>> foreach my $c ($cl->contigs()){
> +>> > +>> my $c =
> +>> > Bio::EnsEMBL::Remote::Object->new('type'=>'contig','id'=>$c); +>>
> +>> > my $id = $c->id();
> +>> > +>> if($c->is_static_golden()){
> +>> > +>> print "\tContig ID: $id (golden)\n";
> +>> > +>> print "\tContig length: ", $c->length(), "\n";
> +>> > +>> print "\tContig is golden?: yes\n";
> +>> > +>> print "\t\tContig global start: ",
> +>> > $c->static_golden_start(), "\n"; +>> print "\t\tContig
> global +>> > end: ", $c->static_golden_end(), "\n"; +>> print
> +>> > "\t\tContig global ori: ", $c->static_golden_ori(), "\n"; +>> +>> >
> #print "\tContig seq: ", $c->seq(), "\n";
> +>> > +>> } else {
> +>> > +>> print "\tContig ID: $id (non-golden)\n";
> +>> > +>> }
> +>> > +>> }
> +>> > +>> }
> +>> > +>>
> +>> > +>>
> +>> > +>> if(0){
> +>> > +>> my $chr = 1;
> +>> > +>> my $start = 100000;
> +>> > +>> my $end = 200000;
> +>> > +>> print "Getting remote virtualcontig for $chr,
> $start-$end...\n"; +>> > +>> my $v =
> +>> > +>>
> +>> >
> Bio::EnsEMBL::Remote::Object->new('type'=>'virtualcontig','chr'=>$chr, +>>
> > +>> 'start'=>$start, 'end'=>$end);
> +>> > +>> print "Virtual contig ID: ", $v->id(), "\n";
> +>> > +>> print "Virtual contig length: ", $v->length(), "\n";
> +>> > +>> print "Virtual contig chromosome: ", $v->_chr_name(), "\n";
> +>> > +>> print "Virtual contig chromosome length: ",
> +>> > $v->fetch_chromosome_length(), +>> "\n";
> +>> > +>>
> +>> > +>> foreach my $g ($v->genes()){
> +>> > +>> $g =
> +>> > Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g); +>>
> +>> > print "\tGene ID: ", $g->id(), "\n";
> +>> > +>> foreach my $t ($g->transcripts()){
> +>> > +>> my $t =
> +>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
> +>> > +>> print "\t\tTranscript ID: ", $t->id(), "\n";
> +>> > +>> print "\t\tTranscript length: ", $t->length(), "\n";
> +>> > +>> #print "\t\tTranscript seq: ", $t->seq(), "\n";
> +>> > +>> #print "\t\tTranscript protein: ", $t->translate(),
> "\n"; +>> > +>> foreach my $e ($t->exons()){
> +>> > +>> my $e =
> +>> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'exon','id'=>$e);
> +>> > +>> print "\t\t\tExon ID: ", $e->id(), "\n";
> +>> > +>> print "\t\t\tExon start: ", $e->ori_start(),
> "\n"; +>> > +>> print "\t\t\tExon end: ", $e->ori_end(),
> "\n"; +>> > +>> print "\t\t\tExon strand: ", $e->strand(),
> "\n"; +>> > +>> print "\t\t\tExon seq: ", $e->seq(), "\n";
> +>> > +>> }
> +>> > +>> }
> +>> > +>>
> +>> > +>> }
> +>> > +>> }
> +>> > +>>
> +>> > +>> if(0){
> +>> > +>> my $p = "ENSP00000223439";
> +>> > +>> print "Getting remote peptide $p...\n";
> +>> > +>> my $p =
> +>> > Bio::EnsEMBL::Remote::Object->new('type'=>'translation','id'=>$p);
> +>> +>> > print $p->seq();
> +>> > +>> }
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>>
> +>> > +>> ******************************************************
> +>> > +>> Tony Cox Email:avc@sanger.ac.uk
> +>> > +>> Sanger Institute WWW:www.sanger.ac.uk
> +>> > +>> Wellcome Trust Genome Campus Webmaster
> +>> > +>> Hinxton Tel: +44 1223 834244
> +>> > +>> Cambs. CB10 1SA Fax: +44 1223 494919
> +>> > +>> ******************************************************
> +>> > +>>
> +>> > +>> _______________________________________________
> +>> > +>> DAS mailing list
> +>> > +>> DAS@biodas.org
> +>> > +>> http://biodas.org/mailman/listinfo/das
> +>> > +>>
> +>> > +>
> +>> >
> +>> > ******************************************************
> +>> > Tony Cox Email:avc@sanger.ac.uk
> +>> > Sanger Institute WWW:www.sanger.ac.uk
> +>> > Wellcome Trust Genome Campus Webmaster
> +>> > Hinxton Tel: +44 1223 834244
> +>> > Cambs. CB10 1SA Fax: +44 1223 494919
> +>> > ******************************************************
> +>> >
> +>> > _______________________________________________
> +>> > DAS mailing list
> +>> > DAS@biodas.org
> +>> > http://biodas.org/mailman/listinfo/das
> +>
> +>--
> +>========================================================================
> +>Lincoln D. Stein Cold Spring Harbor Laboratory
> +>lstein@cshl.org Cold Spring Harbor, NY
> +>========================================================================
> +>
>
> ******************************************************
> Tony Cox Email:avc@sanger.ac.uk
> Sanger Institute WWW:www.sanger.ac.uk
> Wellcome Trust Genome Campus Webmaster
> Hinxton Tel: +44 1223 834244
> Cambs. CB10 1SA Fax: +44 1223 494919
> ******************************************************
>
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================