[DAS] Ensembl via SOAP

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 19 Jun 2002 09:39:06 +1200


Hi -

Sorry for the late reply, I've been following this thread from afar.

Serialization has been a bit of a thorny issue for Biojava. The use of lightweight objects and 'singletons' meant the need for a lot of custon serialization code. Here however I am talking about serialization to binary via the Java serialization API (as used in RMI, J2EE beans etc).

My rather naïve understanding of soap is that things need to be serialized to XML. This is something I would really be interested in seeing. I know from experience that mapping biojava to XML using JAXB will be a headache whithout writing a lot of bridging objects (not enough use of paramaterless constructures and too much use of factory methods in the biojava API for JAXB to work easily).

I have not used Castor or axis. What is their design like?

- Mark


> -----Original Message-----
> From: Brian [mailto:gilmanb@Jforge.net] 
> Sent: Thursday, 6 June 2002 12:34 a.m.
> To: Tony Cox
> Cc: Brian Gilman; ensembl-dev@ebi.ac.uk; das@ebi.ac.uk
> Subject: Re: [DAS] Ensembl via SOAP
> 
> 
> Yeah,
> 
> 	We have lot's of experience writing serializers and 
> deserialiaers...If you need exaples you can look at 
> OmniGene's omnitide package. There are about 20 
> (d/s)erializers checked in. 
> 
> 	as an aside we had the crazy idea of writing 
> serializers for biojava objects but looked at the amount of 
> work involved and thought that we'd have more support to do this. 
> 
> 	Is anyone else interested in helping write sers/desers 
> for biojava/bioperl objects?? 
> 
> 	I think we'd need to talk about the object model and 
> then write the XSD's. From their we could use castor or jaxb 
> or axis (I'd rather do
> axis) to get the objects flowing back and forth over the 
> wire...Let me know...We are very interested in doing this 
> with a partner. 
> 			
> 				Best, 
> 
> 					-B
> 
> 
>  On Wed, 5 Jun 2002, Tony Cox wrote:
> 
> > On Wed, 5 Jun 2002, Brian Gilman wrote:
> > 
> > +>This is great!!
> > +>
> > +>	Do you need help with a java implementation?? I'd be willing to 
> > +>help you out in a week or so...
> > 
> > Hi Brian,
> > 
> > Help would be welcome - as I mentioned below I got a java client 
> > working but I fell over on having to write an object deserializer. 
> > Hopefully you could pick it up there...?
> > 
> > Tony
> > 
> > 
> > +>
> > +>			-B
> > +>
> > +>-----------------------
> > +>Brian Gilman <gilmanb@genome.wi.mit.edu>
> > +>Group Leader Medical & Population Genetics Dept. 
> MIT/Whitehead Inst. 
> > +>Center for Genome Research One Kendall Square, Bldg. 300 / 
> > +>Cambridge, MA 02139-1561 USA phone +1 617  252 1069 / fax 
> +1 617 252 
> > +>1902
> > +>
> > +>
> > +>On Wed, 5 Jun 2002, Tony Cox wrote:
> > +>
> > +>> 
> > +>> I've been playing over the weekend with SOAP access to Ensembl 
> > +>> objects. I have a test server running that can handle queries.
> > +>> 
> > +>> 
> > +>> Nutshell:
> > +>> =========
> > +>> 
> > +>> 
> > +>> use SOAP::Lite +autodispatch =>
> > +>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
> > +>>    proxy    =>   
> 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
> > +>> 
> > +>>    my $trans = 
> > +>> 
> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>'
ENST00000225283');
> > +>>    print $trans->seq(), "\n";
> > +>>    print $trans->translate(), "\n";
> > +>> 
> > +>> 
> > +>> 
> > +>> It is a pretty niave implementation but allows fairly 
> reasonable 
> > +>> access to Ensembl objects via RPC. A remote "proxy" class takes 
> > +>> care of creating and manipulating ensembl objects on the server 
> > +>> side and allows you to make direct calls locally. General 
> > +>> id/start/end type calls all work. Since ensembl objects are all 
> > +>> intimately tied to DB connections which all goes horribly wrong 
> > +>> over SOAP the proxy object takes care creating these 
> connections 
> > +>> as necessary on the server. Calls that returns an 
> object have been 
> > +>> changed to return an ID - which can be used to create a remote 
> > +>> object. I've only done the main stuff - features, SNPs etc, are 
> > +>> missing
> > +>> 
> > +>> The good thing is that you don't need any local 
> databases or even 
> > +>> ensembl code, just a working copy of SOAP::Lite from 
> CPAN. The bad 
> > +>> thing is that it is pretty __slow__ at the moment. The 
> server is 
> > +>> not running under mod_perl so most of the response time 
> is taken 
> > +>> up in module compilation and XML transport. I'll try to get it 
> > +>> running under mod_perl and with transport compression enabled.
> > +>> 
> > +>> I don't see this as an interface of choice for the 
> > +>> bioinformatician! - it is too slow and anyway they will 
> have the 
> > +>> "real" ensembl code to turn to. This is much more of a 
> lightweight 
> > +>> interface for conveniently fetching sequences, genes etc where 
> > +>> speed is not a critical issue, and the convenience of a simple 
> > +>> programming interface is the important factor.
> > +>> 
> > +>> I'd be very interested to see interoperability tested. 
> I did write 
> > +>> a very small java client to make requests but rapidly 
> got out of 
> > +>> my depth when having to write a deserializer for the remote 
> > +>> object. After looking into the Omnigene code I see how 
> these work 
> > +>> but I'm rather hoping that somebody on the omnigene team might 
> > +>> have a go at doing this.
> > +>> 
> > +>> Following is a simple script that provides examples of 
> > +>> manipulating remote objects. You "get" a remote object on the 
> > +>> server be creating a new Bio::EnsEMBL::Remote::Object 
> and giving a 
> > +>> it a type and ID. At the moment you can only fetch 
> virtualcontigs, 
> > +>> genes, transcripts, exons, clones, contigs and translations 
> > +>> (peptides). By the magic of "autodispatch", if you get 
> a "thingy" 
> > +>> back, you can just treat it as a normal object and make 
> calls on 
> > +>> it. Perl's autoloader will try and satisfy calls that are not 
> > +>> overloaded in the remote object (I know this sucks). If 
> they are 
> > +>> simple get/property calls they will probably work - if the call 
> > +>> returns an object/objects, bad things will probably 
> happen. Trying 
> > +>> to write to the object may work (I havn't tried it) but 
> is likely 
> > +>> not to be a useful thing to do! Remember this is a 
> > +>> transaction-type system where all the responses need to be 
> > +>> marshalled before transport takes place so it will not "stream" 
> > +>> data to you as if it were a socket-style connection.
> > +>> 
> > +>> In the event of an error, you usually end up with undef 
> (the code 
> > +>> is pretty raw at the moment). If you really want you can track 
> > +>> down errors, use the following
> > +>> block:
> > +>> 
> > +>>    if(SOAP::Lite->self->call->fault) {
> > +>>         print "Fault code: ", 
> SOAP::Lite->self->call->faultcode, "\n";
> > +>>         print "Fault string: ", 
> SOAP::Lite->self->call->faultstring, "\n";
> > +>>         print "Fault detail: ", 
> SOAP::Lite->self->call->faultdetail, "\n";
> > +>>         print "Fault actor: ", 
> SOAP::Lite->self->call->faultactor, "\n";
> > +>>         exit;
> > +>>    }
> > +>> 
> > +>> 
> > +>> comments and suggestions welcome,
> > +>> 
> > +>> cheers
> > +>> 
> > +>> Tony
> > +>> 
> > +>> 
> > +>> 
> > +>> 
> > +>> 
> > +>> 
> > +>> To try the server out enable one or more of the 
> following blocks:
> > +>> 
> > +>> 
> > +>> #!/usr/local/bin/perl
> > +>> 
> > +>> package MySoapClient;
> > +>> 
> > +>> use strict;
> > +>> use SOAP::Lite +autodispatch =>
> > +>>    uri      =>   'Bio::EnsEMBL::Remote::Object',
> > +>>    proxy    =>   
> 'http://services.ensembl.org:7070/cgi-bin/ensembl_rpcrouter';
> > +>>    
> > +>> if(1){
> > +>>     my @g = (qw(ENSG00000131591 BRCA1));
> > +>>     foreach my $g (@g){
> > +>>         print "Getting gene: $g...\n";
> > +>>         $g = 
> Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
> > +>>         print "\tGene ID: ", $g->id(), "\n";
> > +>>         foreach my $t ($g->transcripts()){
> > +>>             my $t = 
> > +>> 
> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
> > +>>             print "\t\tTranscript ID: ", $t->id(), "\n";
> > +>>             print "\t\tTranscript length: ", $t->length(), "\n";
> > +>>             #print "\t\tTranscript seq: ", $t->seq(), "\n"; 
> > +>>             print "\t\tTranscript protein: ", 
> $t->translate(), "\n";
> > +>>         }
> > +>>     }
> > +>> }
> > +>> 
> > +>> if(0){
> > +>>     print "Getting remote clone AP000869...\n"; 
> > +>>     my $cl = 
> > +>> 
> Bio::EnsEMBL::Remote::Object->new('type'=>'clone','id'=>'AP000869');
> > +>>     print "Clone: ", $cl->embl_id(), "\n";
> > +>>     print "Version: ", $cl->version(), "\n";
> > +>>     
> > +>>     foreach my $c ($cl->contigs()){
> > +>>         my $c = 
> Bio::EnsEMBL::Remote::Object->new('type'=>'contig','id'=>$c);
> > +>>         my $id = $c->id();
> > +>>         if($c->is_static_golden()){
> > +>>             print "\tContig ID: $id (golden)\n";
> > +>>             print "\tContig length: ", $c->length(), "\n";
> > +>>             print "\tContig is golden?: yes\n";
> > +>>             print "\t\tContig global start: ", 
> $c->static_golden_start(), "\n";
> > +>>             print "\t\tContig global end: ",   
> $c->static_golden_end(), "\n";
> > +>>             print "\t\tContig global ori: ",   
> $c->static_golden_ori(), "\n";
> > +>>             #print "\tContig seq: ", $c->seq(), "\n"; 
> > +>>         } else {
> > +>>             print "\tContig ID: $id (non-golden)\n";
> > +>>         }
> > +>>     }
> > +>> }
> > +>> 
> > +>> 
> > +>> if(0){
> > +>>     my $chr = 1;
> > +>>     my $start = 100000;
> > +>>     my $end = 200000;
> > +>>     print "Getting remote virtualcontig for $chr, 
> $start-$end...\n"; 
> > +>>     my $v = 
> > +>> 
> Bio::EnsEMBL::Remote::Object->new('type'=>'virtualcontig','chr'=>$
> > +>> chr,
> > +>> 'start'=>$start, 'end'=>$end);
> > +>>     print "Virtual contig ID: ", $v->id(), "\n"; 
> > +>>     print "Virtual contig length: ", $v->length(), "\n"; 
> > +>>     print "Virtual contig chromosome: ", $v->_chr_name(), "\n"; 
> > +>>     print "Virtual contig chromosome length: ", 
> $v->fetch_chromosome_length(),
> > +>> "\n"; 
> > +>> 
> > +>>     foreach my $g ($v->genes()){
> > +>>         $g = 
> Bio::EnsEMBL::Remote::Object->new('type'=>'gene','id'=>$g);
> > +>>         print "\tGene ID: ", $g->id(), "\n";
> > +>>         foreach my $t ($g->transcripts()){
> > +>>             my $t = 
> > +>> 
> Bio::EnsEMBL::Remote::Object->new('type'=>'transcript','id'=>$t);
> > +>>             print "\t\tTranscript ID: ", $t->id(), "\n";
> > +>>             print "\t\tTranscript length: ", $t->length(), "\n";
> > +>>             #print "\t\tTranscript seq: ", $t->seq(), "\n"; 
> > +>>             #print "\t\tTranscript protein: ", 
> $t->translate(), "\n";
> > +>>             foreach my $e ($t->exons()){
> > +>>                 my $e = 
> > +>> Bio::EnsEMBL::Remote::Object->new('type'=>'exon','id'=>$e);
> > +>>                 print "\t\t\tExon ID: ", $e->id(), "\n";
> > +>>                 print "\t\t\tExon start: ", 
> $e->ori_start(), "\n";
> > +>>                 print "\t\t\tExon end: ", $e->ori_end(), "\n";
> > +>>                 print "\t\t\tExon strand: ", $e->strand(), "\n";
> > +>>                 print "\t\t\tExon seq: ", $e->seq(), "\n";
> > +>>            }
> > +>>         }
> > +>>     
> > +>>     }
> > +>> }
> > +>> 
> > +>> if(0){
> > +>>     my $p = "ENSP00000223439";
> > +>>     print "Getting remote peptide $p...\n"; 
> > +>>     my $p = 
> Bio::EnsEMBL::Remote::Object->new('type'=>'translation','id'=>$p);
> > +>>     print $p->seq();
> > +>> }
> > +>> 
> > +>> 
> > +>> 
> > +>> 
> > +>> ******************************************************
> > +>> Tony Cox			Email:avc@sanger.ac.uk
> > +>> Sanger Institute		WWW:www.sanger.ac.uk
> > +>> Wellcome Trust Genome Campus	Webmaster
> > +>> Hinxton				Tel: +44 1223 834244
> > +>> Cambs. CB10 1SA			Fax: +44 1223 494919
> > +>> ******************************************************
> > +>> 
> > +>> _______________________________________________
> > +>> DAS mailing list
> > +>> DAS@biodas.org
> > +>> http://biodas.org/mailman/listinfo/das
> > +>> 
> > +>
> > 
> > ******************************************************
> > Tony Cox			Email:avc@sanger.ac.uk
> > Sanger Institute		WWW:www.sanger.ac.uk
> > Wellcome Trust Genome Campus	Webmaster
> > Hinxton				Tel: +44 1223 834244
> > Cambs. CB10 1SA			Fax: +44 1223 494919
> > ******************************************************
> > 
> > _______________________________________________
> > DAS mailing list
> > DAS@biodas.org
> > http://biodas.org/mailman/listinfo/das
> > 
> 
> -- 
> ----------------
> Brian Gilman <gilmanb@jforge.net>
> 
> 
> 
> 
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================