[Biojava-l] LightWeight vs HeavyWeight sequence objects

Mike Marsh mm692227@sorbonne.imgen.bcm.tmc.edu
Wed, 26 Jan 2000 01:23:02 -0600 (CST)


There has been some concern that an object-oriented implementation of
sequence is "heavyweight" with respect to a string implementation.
Specifically, Mark has addressed memory concerns.

If this is a question of efficiency and overhead assoicated with the OO
implementation, I don't think it is a valid concern.  Below is my
interpretation of what really goes on in terms of memory allocation.  If I
am wrong about this, don't hesitate to correct me.

A String is just a fancy wrapper around an array of unicode characters
(2 bytes per char).  The OO approach is just a wrapper around an array of
pointers to static SequenceChar objects (8-16 bytes per pointer, depending
on your system's architechture).  The OO approach requires one addional
step of dereferencing the pointer.  Dereferencing is cheap:  it's simply
pushing around integers and fetching from memory.

The key point here is that you only instantiate each ProteinChar object
once and then you simply reference it later with pointers.  This is not at
all complicated.  This is quite easy because whenever you addElement() to
a vector, java doesn't duplicate the object (and space in memory); instead
it adds a pointer to the object.

For twenty amino acids, you only have to instantiate twenty different
"heavyweight" ProteinChar, but every sequence that points to them gets to
take advantage of all the weight/functionality.   

If someone can show me what I am missing here about the "heavyweight"
burden of using objects, I'd appreciate it.

-mike


PS I would like to reinforce that all of the functionality of Strings can
easily be added to an object approach.  With appropriate wrappers,
Sequence can "implement" a "String interface", by offering 
  public char charAt(int index)
  public String concat(String str)
  public String substring(int beginIndex, int endIndex)
  ...

 
------------------------------------------------------------- 
Mike Marsh
Graduate Student in Structural and Computational Biology
Baylor College of Medicine.  Houston, TX

FON: 713/798-6034
Permanent Email:  mikemarsh@bigfoot.com
-------------------------------------------------------------