[Biojava-l] BioJava discussion board

Brian Gilman gilmanb@genome.wi.mit.edu
Wed, 28 Aug 2002 09:35:59 -0400 (EDT)

I agree with Patrick on that one. Perhaps Agave could be used as the
serialzation layer here?? It would be a little bit of work to get
everything munged into Agave or BSML but think about the benefits for we
poor middleware guys!


Brian Gilman <gilmanb@genome.wi.mit.edu>
Group Leader Medical & Population Genetics Dept.
MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902

On Wed, 28 Aug 2002, Patrick McConnell wrote:

> I agree with your assessment.  There does need to be serializers and
> deserializers to some 'SimpleSequence' format.
> Why not use some established XML format such as Agave or BSML as the
> intermediate representation?  BioJava already has some support for Agava.
> -Patrick
> Thomas Down <td2@sanger.ac.uk>@biojava.org on 08/28/2002 05:55:42 AM
> Sent by:    biojava-l-admin@biojava.org
> To:    Brian Gilman <gilmanb@genome.wi.mit.edu>
> cc:    Thomas Down <td2@sanger.ac.uk>, biojava-l@biojava.org
> Subject:    Re: [Biojava-l] BioJava discussion board
> Hi Brian
> On Wed, Aug 28, 2002 at 12:09:05AM -0400, Brian Gilman wrote:
> >
> >     BioJava does not work well in a distributed environment in terms
> > of RMI calls or in the "weservices" stack. Custom
> > serializers/deserializers need to be made for each and every object that
> > exists in the feature heirarchy. This is painful to say the least. T
> >
> >     Where's the contructor!! There are a lot of factories that make,
> > while making client side programming very easy to do, kill a middleware
> > guy like myself.
> I think you're right about this being the main point of impedence
> mismatch between BioJava and Web Services (or other distributed
> object systems, for that matter).
> What I'm not so certain about is how easily it is fixable.  There's
> certainly more to this than just adding constructors.  The basic
> web-services serialization system works well for objects which fit
> closely with the Javabeans model.  For instance:
>     public class Employee {
>        public String getName();
>        public void setName(String newName);
>        public OrganizationalUnit getDepartment();
>        public void setDepartment(OrganizationalUnit newDepartment);
>     }
> In fact, that kind of example seems to be precisely what a lot
> of the developers of web-services had in mind.
> I'm going to concentrate on the BioJava Sequence interface and
> related stuff, since that's the bit most people are familiar
> with, and it's also one of the most problematic parts from
> a distribution point of view.
> Simply adding a constructor and some javabeany mutator methods
> to SimpleSequence won't fix anything -- your SOAP toolkit
> (or whatever) still won't understand how to get at the Symbol
> or Feature objects (since these need to be iterated).  And even
> if it could access the Symbols as an array (or whatever), the
> default SOAP-ENC serialization of these will be quite hideously
> inefficient.
> To make a Sequence object which genuinely plays nicely with
> SOAP (and other distributed object and persistance technologies)
> you're going to end up with something looking like:
>    public class Sequence() {
>      public Sequence();
>      public String getSeqString();
>      public void setSeqString(String seq);
>      public String getName();
>      public void setName(String name);
>      public Feature[] getFeatures();
>      public void setFeatures(Feature[] features);
>    }
> This will, of course, SOAP-ENC trivially.  But whether anyone would
> really like to program with this is another matter -- I for one
> would prefer something that looks more like the current BioJava
> interface.
> In the `data blob' world, it's also far harder to impose conditions
> like "Features must fit onto the Sequence to which they're attached".
> A lot of the factory-patterns in BioJava are there principally to
> ensure data integrity.
> The final issue is that, arguably, a lot of serialization belongs
> on interfaces rather than classes.  Suppose I get a sequence from
> the biojava-ensembl package.  It'll be an implementation of the
> class EnsemblContigSequence, which isn't even a public class, let
> alone has a public constructor.  Attached, it has lots of ensembl-specific
> Feature implementations (which are also package-private, of course).
> Behind the scenes things are even worse from a serialization point
> of view -- lots of lazy fetching, and data caches which are maintained
> by the containing EnsemblSequenceDB object rather than the Sequence
> itself.
> If I pass this object into a serializer, what I want to come out
> probably isn't a detailed description of the guts of that particular
> EnsemblContigSequence object -- the client machine might not even
> have biojava-ensembl at all.  I'd rather just serialize everything
> in a generic way, and re-create everything on the client as a
> SimpleSequence (or whatever).  Does this make sense?
> So what solutions do we have?
>    1. Come up with an `over-the-wire' API which is based on
>       data-blobs (like the example Sequence class, above) rather
>       than complex interfaces and factories.  It'll be easy to
>       bridge from this back to something more like the current
>       BioJava client API, which remains a client-oriented API.
>    2. Bite the bullet and write custom serializers/deserializers
>       to transform between BioJava and some reasonably neutral
>       XML representations (which could be shared with Omnigene
>       and other projects).  I know doing this sucks, but at the
>       end of the day, there aren't /that/ many data types which
>       need to be shuffled around.  It might be the easiest option
>       to get some really compelling web services up and working
>       with BioJava.
>    3? Come up with some scheme of metadata which allows the semantics
>       of BioJava (and other) interfaces to be defined in enough
>       detail that they can be serialized and deserialized automatically.
>       This is really quite similar to option 2, except with a different
>       language.  I'd guess this is the hardest option, but much, in
>       the future, also be useful for other things -- e.g. auto-generating
>       database adaptors.
> And I guess the final alternative...
>     4. Take a completely different approach, declare the `interface
>        oriented' BioJava 1 and interesting experiment, and design
>        BioJava2 in a more `data-blob' fashion.
> Personally, I think (4) would be a great shame.  But it undoubtably
> would make supporting distributed systems, and using fully-automatic
> object persistance solutions, a whole lot easier.  So I guess it's
> something we should discuss.
>      Thomas.
> PS. I think there's a lot in common between the issues discussed here,
>     and the question of UML class diagrams which came up on the
>     discussion board yesterday.  UML is also more comfortable with
>     the `data blob' way of doing things.  When I wrote the two example
>     classes in this message, I realized that it would have been
>     easier to write them in UML than Java.  This is not true of most
>     of the BioJava interfaces.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l