[Biojava-l] Re: the current discussion

Mark Schreiber mark_s@sanger.otago.ac.nz
Wed, 26 Jan 2000 15:55:45 +1300 (NZDT)


On Tue, 25 Jan 2000, Matthew Pocock wrote:

> 
> >
> > >
> > > On GUIS:  agree with most of what's been said.  Should definitely keep the
> > > GUI isolated from the implementation of the model, in accordance with
> > > Model-View-Contoller paradigm
> > >
> > >
> >
> > I agree completely. But if people can design GUI's that can easily be used
> > with several packages or that fit really well with a package I think they
> > would be great. Things like a sequence viewer could fit with many
> > implementations of a sequence class. Also for those interested in HMMs I
> > have often thought it would be great to have a GUI that lets you perform
> > model surgery graphically. (Oh to have the time!)
> >
> 
> Ah - I have HMM stuff in the package bio.alignment, and I have written a GUI for
> visualizing models. It is only a hop-skip-and-a-jump from being an HMM editor.

Interesting stuff. I don't know much about drag and drop stuff etc but if
anyone out there does I really think this would be a useful tool for
HMMers.

> 
> > <snip/>
> > This maybe a matter of person utility. For my work I have no need at the
> > moment for anything other than strings however those doing protein
> > modelling will benefit more from Mikes strategy. Lets have both! and
> > why not have some kind of class to convert between the two.
> 
> We use objects for several very good reasons like an A in DNA is distinct from
> an A in a protein - they are seperate objects with no overlap in meening, and
> are both seperate objects from state A in an HMM. Also, we can model things like
> protein phosphorylation naturaly by putting in a phospho-aa object that can
> print itself out as the normal character if needed or can behave in special ways
> if you want it to - no extra memory cost to having phosphorylation information
> in a proten (except a slightly expanded protein alphabet).
> 

I agree with the utility of it but practically if you want to number
crunch a bacterial genome that means 4 million objects to hold in memory
unless you are a tricky programmer (which I am not). The memory
requirements must surely be substantially more than for a single object
cantaining a 4 million member String. 

> Our sequences have  a simple method to retrieve the sequence as a string of
> chars where each char represents a single residue. Also, as everything is
> implemented on top of interfaces, you could write an implementation that realy
> did use a string of chars to represent the sequence, as long as you wired in
> apropreate residueAt and iterator methods.

Having the ability to change between the two models is definitely the way
to go. (Unless you think there is no use for String based analysis).

> 
> Having residue objects catches loads of errors that would go unnoticed
> otherwise. Also, for HMMs, each state within the model is a State object that
> extends Residue, so you can naturaly manipulate sequences of states. This is
> realy usefull - a multiple-sequece-aligment can contain sequences and states.
> But - as states are not defined by chars, we can make virtual states with no
> sensible way of naiming them.

I like this idea but I have one reservation. By deriving a State object
from a Residue object you loose some of the flexibility of an HMM. This is
because a State can emit no just a residue but also a string of
residues (as in GeneMark.hmm) or a vector, or even another HMM (or
anything else that you may want to emit).

> 
> Anyway, I guess what i am saying is that you can have your cake and eat it with
> this one - once hotspot has got its teeth in - and the code you end up with
> looks almost identical to using chars, but it is type-checked and behaves oopy,
> not stringy.
> 

I am in favour of it but I have my doubts about the memory overheads. It
also doesn't fit well with the Object database approach of storing the
sequence as 1 object. I still think it would benefit biojava to have both
available. With a suitable conversion class applications could still be
built using abjects from both approaches as appropriate.

Mark

> Matthew
> 
> >

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Schreiber			Ph: 64 3 4797875
Rm 218				email mark_s@sanger.otago.ac.nz
Department of Biochemistry	email m.schreiber@clear.net.nz
University of Otago		
PO Box 56
Dunedin
New Zealand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~