[Biojava-l] Re : Mutable objects

Ewan Birney birney@ebi.ac.uk
Sun, 2 Jul 2000 23:25:55 +0100 (BST)


(I realised that I only replied to thomas and not the list...)

Thomas

inside bioperl/ensembl we have delt with nastily long sequences for a
while now.

There are two key things I would recommend, both of which you are heading
towards I suspect, so I think I am just going to reassure your prejuidice.


(a) Interfaces should be read-only. When an implementation allows/extends
an interface with mutability, all bets should be off - the client has to
become savy and manage things.

the main point here is that guarenteeing that mutable objects make sense
to client code which are not coordinated is just a nightmare. I strongly
believe that this is a problem that can only be solved in the client
area, and you can spend alot of time trying to make objects solve this
problem to no avail.

In Perl, as there are no formal interfaces this just relies on the
documentation/convention that if you just call

   $seq = $obj->seq(); # a get

then everything is guarenteed to work, but if you ever call:

   $obj->seq($newseq); # a set

then the client has to know and understand what he is doing and code
accordingly.

As you are in nice, compile-time-typed java you might want to split
out mutable from immutable interfaces, and heavily document that the
user a mutable interface better read the documentation on its
implementation before it does something.


(you might be interested about "what happens in database cases" - keep
tuned into the disucssion about Bio::DB::UpdateableSeqI interface on
bioperl, whoes design was heavily influenced/bullied by the Ensembl
design. the main way to wriggle out of this is to have

     $seq->($newseq); # only changes client local data.

and then to have

     $database->write_seq($seq);

as a separate call. This dodges a whole bunch of otherwise very nasty
locking issues on the database, which is definitely worth its increased
client complexity).


Take home message: don't try to solve everything for the client when it
comes to writing to memory/databases. There are too many policies - the 
library should be agnostic to them and let the client read the server's
documentation - it shouldn't be your problem... ;)).


(b) for long sequences, encourage methods and servers to make heavy use of
a 

	$obj->subseq(1000,1200);

call. This gives the implementing classes a fighting chance of reorgansing
memory/using database access smartly and/or file storage to get around
in built limits and also provide faster execuation.


Anything that works on 35MBases at a time can do so in some sort of chunk
manner. 


e.



-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------