[Biojava-l] BioJava discussion board

Dickson, Mike mdickson@netgenics.com
Wed, 28 Aug 2002 19:36:00 -0400


See below...

> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk]
> Sent: Wednesday, August 28, 2002 5:56 AM
> To: Brian Gilman
> Cc: Thomas Down; biojava-l@biojava.org
> Subject: Re: [Biojava-l] BioJava discussion board
> 
> Hi Brian
> 
> On Wed, Aug 28, 2002 at 12:09:05AM -0400, Brian Gilman wrote:
> >
> > 	BioJava does not work well in a distributed environment in terms
> > of RMI calls or in the "weservices" stack. Custom
> > serializers/deserializers need to be made for each and every object that
> > exists in the feature heirarchy. This is painful to say the least. T
> >
> > 	Where's the contructor!! There are a lot of factories that make,
> > while making client side programming very easy to do, kill a middleware
> > guy like myself.

The factory pattern in biojava is very straightforward in my opinion and
pretty common.  I don't see the connection or issue with middleware
programming.  

In the cases where we have used BioJava it's used on both the client and
server as an object model to represent biological objects.  It works pretty
well for the most part for this.  We don't use it as the basis for remote
programming and in my opinion that's not a problem since biojava isn't
really optimized for that sort of access anyway.  Frankly, I'm not sure any
single model would be ideal for all of these cases.  What I personally
believe is missing is a meta-model for all this stuff that can be translated
into representational models; whether that be XML, SQL, Java, Perl, what
have you.  So maybe what we need is a BioUML.  This approach is at the heart
of the OMG's model driven architecture work and its been used to good effect
in at least one case (expression data) in the life sciences group (LSR)
within the OMG.
> 

Some stuff deleted...

> 
> So what solutions do we have?
> 
>    1. Come up with an `over-the-wire' API which is based on
>       data-blobs (like the example Sequence class, above) rather
>       than complex interfaces and factories.  It'll be easy to
>       bridge from this back to something more like the current
>       BioJava client API, which remains a client-oriented API.

This is my preference for marshalling data over the wire.  Its not the whole
problem though. See below.

> 
>    2. Bite the bullet and write custom serializers/deserializers
>       to transform between BioJava and some reasonably neutral
>       XML representations (which could be shared with Omnigene
>       and other projects).  I know doing this sucks, but at the
>       end of the day, there aren't /that/ many data types which
>       need to be shuffled around.  It might be the easiest option
>       to get some really compelling web services up and working
>       with BioJava.

This could be a simple version of #1.  Having an AGAVE or BSML (or some
other nifty format) serializer could be used in multiple cases one of which
would be a JAXM message with an encoded XML document.  Its still not clear
to me though I'll always want the whole object so really what has to happen
first is the remote api's need definition: i.e. what does the web service
look like that uses this approach.

> 
>    3? Come up with some scheme of metadata which allows the semantics
>       of BioJava (and other) interfaces to be defined in enough
>       detail that they can be serialized and deserialized automatically.
>       This is really quite similar to option 2, except with a different
>       language.  I'd guess this is the hardest option, but much, in
>       the future, also be useful for other things -- e.g. auto-generating
>       database adaptors.

This is starting to get to my BioUML comment above and as Thomas indicated
if its done right it has a large impact on many other things in the future
(like generating database binding from the model, for instance).  Personally
I think this is where the interesting work is.

> 
> And I guess the final alternative...
> 
>     4. Take a completely different approach, declare the `interface
>        oriented' BioJava 1 and interesting experiment, and design
>        BioJava2 in a more `data-blob' fashion.
> 
> Personally, I think (4) would be a great shame.  But it undoubtably
> would make supporting distributed systems, and using fully-automatic
> object persistance solutions, a whole lot easier.  So I guess it's
> something we should discuss.

I agree that punting on the interface based approach would be bad. It's a
very valid way to address accessing data and behavior.  The data blob
approach doesn't really adequately describe the interface contract nor does
it easily lend itself to multiple implementations.  Personally I'm an
advocate of strongly typed, interface driven development.  XML has its place
but it doesn't replace the kind of programming supported by biojava now.

>      Thomas.
> 
> 
> 
> PS. I think there's a lot in common between the issues discussed here,
>     and the question of UML class diagrams which came up on the
>     discussion board yesterday.  UML is also more comfortable with
>     the `data blob' way of doing things.  When I wrote the two example
>     classes in this message, I realized that it would have been
>     easier to write them in UML than Java.  This is not true of most
>     of the BioJava interfaces.

Hmm.  Here I rest my case.  If we start with a good model we can derive
various representations from it.  That would be my choice going forward.

Mike

> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l