[Biojava-l] BioJava discussion board
Brian Gilman
gilmanb@genome.wi.mit.edu
Wed, 28 Aug 2002 09:35:59 -0400 (EDT)
I agree with Patrick on that one. Perhaps Agave could be used as the
serialzation layer here?? It would be a little bit of work to get
everything munged into Agave or BSML but think about the benefits for we
poor middleware guys!
-B
-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Group Leader Medical & Population Genetics Dept.
MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617 252 1069 / fax +1 617 252 1902
On Wed, 28 Aug 2002, Patrick McConnell wrote:
>
> I agree with your assessment. There does need to be serializers and
> deserializers to some 'SimpleSequence' format.
>
> Why not use some established XML format such as Agave or BSML as the
> intermediate representation? BioJava already has some support for Agava.
>
> -Patrick
>
>
>
>
>
> Thomas Down <td2@sanger.ac.uk>@biojava.org on 08/28/2002 05:55:42 AM
>
> Sent by: biojava-l-admin@biojava.org
>
>
> To: Brian Gilman <gilmanb@genome.wi.mit.edu>
> cc: Thomas Down <td2@sanger.ac.uk>, biojava-l@biojava.org
>
> Subject: Re: [Biojava-l] BioJava discussion board
>
>
> Hi Brian
>
> On Wed, Aug 28, 2002 at 12:09:05AM -0400, Brian Gilman wrote:
> >
> > BioJava does not work well in a distributed environment in terms
> > of RMI calls or in the "weservices" stack. Custom
> > serializers/deserializers need to be made for each and every object that
> > exists in the feature heirarchy. This is painful to say the least. T
> >
> > Where's the contructor!! There are a lot of factories that make,
> > while making client side programming very easy to do, kill a middleware
> > guy like myself.
>
> I think you're right about this being the main point of impedence
> mismatch between BioJava and Web Services (or other distributed
> object systems, for that matter).
>
> What I'm not so certain about is how easily it is fixable. There's
> certainly more to this than just adding constructors. The basic
> web-services serialization system works well for objects which fit
> closely with the Javabeans model. For instance:
>
> public class Employee {
> public String getName();
> public void setName(String newName);
> public OrganizationalUnit getDepartment();
> public void setDepartment(OrganizationalUnit newDepartment);
> }
>
> In fact, that kind of example seems to be precisely what a lot
> of the developers of web-services had in mind.
>
>
> I'm going to concentrate on the BioJava Sequence interface and
> related stuff, since that's the bit most people are familiar
> with, and it's also one of the most problematic parts from
> a distribution point of view.
>
> Simply adding a constructor and some javabeany mutator methods
> to SimpleSequence won't fix anything -- your SOAP toolkit
> (or whatever) still won't understand how to get at the Symbol
> or Feature objects (since these need to be iterated). And even
> if it could access the Symbols as an array (or whatever), the
> default SOAP-ENC serialization of these will be quite hideously
> inefficient.
>
> To make a Sequence object which genuinely plays nicely with
> SOAP (and other distributed object and persistance technologies)
> you're going to end up with something looking like:
>
> public class Sequence() {
> public Sequence();
> public String getSeqString();
> public void setSeqString(String seq);
> public String getName();
> public void setName(String name);
> public Feature[] getFeatures();
> public void setFeatures(Feature[] features);
> }
>
> This will, of course, SOAP-ENC trivially. But whether anyone would
> really like to program with this is another matter -- I for one
> would prefer something that looks more like the current BioJava
> interface.
>
> In the `data blob' world, it's also far harder to impose conditions
> like "Features must fit onto the Sequence to which they're attached".
> A lot of the factory-patterns in BioJava are there principally to
> ensure data integrity.
>
> The final issue is that, arguably, a lot of serialization belongs
> on interfaces rather than classes. Suppose I get a sequence from
> the biojava-ensembl package. It'll be an implementation of the
> class EnsemblContigSequence, which isn't even a public class, let
> alone has a public constructor. Attached, it has lots of ensembl-specific
> Feature implementations (which are also package-private, of course).
> Behind the scenes things are even worse from a serialization point
> of view -- lots of lazy fetching, and data caches which are maintained
> by the containing EnsemblSequenceDB object rather than the Sequence
> itself.
>
> If I pass this object into a serializer, what I want to come out
> probably isn't a detailed description of the guts of that particular
> EnsemblContigSequence object -- the client machine might not even
> have biojava-ensembl at all. I'd rather just serialize everything
> in a generic way, and re-create everything on the client as a
> SimpleSequence (or whatever). Does this make sense?
>
>
>
>
> So what solutions do we have?
>
> 1. Come up with an `over-the-wire' API which is based on
> data-blobs (like the example Sequence class, above) rather
> than complex interfaces and factories. It'll be easy to
> bridge from this back to something more like the current
> BioJava client API, which remains a client-oriented API.
>
> 2. Bite the bullet and write custom serializers/deserializers
> to transform between BioJava and some reasonably neutral
> XML representations (which could be shared with Omnigene
> and other projects). I know doing this sucks, but at the
> end of the day, there aren't /that/ many data types which
> need to be shuffled around. It might be the easiest option
> to get some really compelling web services up and working
> with BioJava.
>
> 3? Come up with some scheme of metadata which allows the semantics
> of BioJava (and other) interfaces to be defined in enough
> detail that they can be serialized and deserialized automatically.
> This is really quite similar to option 2, except with a different
> language. I'd guess this is the hardest option, but much, in
> the future, also be useful for other things -- e.g. auto-generating
> database adaptors.
>
>
> And I guess the final alternative...
>
> 4. Take a completely different approach, declare the `interface
> oriented' BioJava 1 and interesting experiment, and design
> BioJava2 in a more `data-blob' fashion.
>
>
>
> Personally, I think (4) would be a great shame. But it undoubtably
> would make supporting distributed systems, and using fully-automatic
> object persistance solutions, a whole lot easier. So I guess it's
> something we should discuss.
>
> Thomas.
>
>
>
> PS. I think there's a lot in common between the issues discussed here,
> and the question of UML class diagrams which came up on the
> discussion board yesterday. UML is also more comfortable with
> the `data blob' way of doing things. When I wrote the two example
> classes in this message, I realized that it would have been
> easier to write them in UML than Java. This is not true of most
> of the BioJava interfaces.
> _______________________________________________
> Biojava-l mailing list - Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>