[Biojava-l] Analysis output on sequences (calculating properties ofSymbolLists)

David Martin david.martin@biotek.uio.no
Tue, 16 May 2000 08:14:32 +0100


On Tue, 16 May 2000, Gerald Loeffler wrote:

> interesting! allow me to interpret your idea and add something:
> 
> whenever we talk about "calculating something", the Strategy pattern
> springs to mind, where a specific "algorithm" is represented as an
> object, and the type of that object specifies the method through which
> all these different algorithms are invoked - that's what my original
> suggestion tried to do:
> 
> interface SymbolListPropertyCalculator {
>   Object calculateSymbolListProperty(SymbolList sl);
> }
> 
> I.e. i defined exactly one algorithm type that works on a SymbolList and
> returns "anything". As allways, when instantiating a concrete
> implementation you can pass any algorithm-specific information (to the
> constructor), and so the method arguments can be minimal and should just
> mirror the essence of the algorithm.
> 
> Matthew on the other hand favoured a specific algorithm type that
> perfectly matched the examples of algorithms i needed, namely an
> algorithm type that returns a double, i.e. a more concrete Strategy
> 
> interface SymbolListDoublePropertyCalculator {
>   double calculateSymbolListDoubleProperty(SymbolList sl);
> }
> 
> What you are saying in part is that what we should really do is define a
> set of algorithm-types, where we should categorise on the return type
> and/or argument types of the algorithm, e.g.
> 
> interface SingleValueAnalysis {
>   double calculateSingleValue(SymbolList sl);
> }
> 
> interface ContinuousValueAnalysis {
>   Scalar1DFunction calculateContinuousValue(SymbolList sl);
> }
> 

Well, these would be derived from the abstract classes/interfaces.

If there was a generic abstract interface described for the AnalysisModule
(ie an object that acts as a result factory) and for the AnalysisResult
then you could write a generic analysis system and use some sort of
querying of the result to work out how to deal with it.

My initial ideas centered around some sort of CORBA implementation using a
name server to arrange AnalysisModules by input/output type and allowing
dynamic querying by the client programs to find out what analyses can be
done today. It is then nice to be able to have some degree of
prearrangment, especially when the collections of AnalysisModules could
take inputs varying from sequence through structure.


> Thus instead of an "catch-all" algorithm type (as in my original
> proposal) we would have a set of more concisely defined algorithm types
> (as also in Matthews proposal).

Yup, though still quite generic.

> 
> Additionally, you are saying that details on how exactly the algorithm
> was invoked (which parameters were used) should be returned as well. I
> don't see a need for this, because when constructing the Strategy object
> you must know which parameters to pass to it's constructor. Anyways, the
> only possible type for such an informative object would be Object, so
> may be we could derive the analysis interfaces from a common
> super-interface that defines a method for getting this
> parameters-object:
> 
> interface Analysis {
>   Object getAnalysisParameters();
> }

Interface AnalysisResult {
	ParameterCollection getAnalysisParameters();
}



> 
> But anybody interested in the parameters of a specific algorithms would
> need to know the type of object returned here...

Yes and no. One can conceive of a generic class of algorithm (ie
algorithms that take a sequence and resturn a single double value) where
the underlying algorithm is unknown, ie a black box. The idea is to turn
this into some sort of annotated balck box (grey box?) where the end
result is annotated with the parameters used to generate the result. This
is important.

When building an AnalysisQuery it is perfectly possible to not know what
the parameters are for the query at compile time, hence having the
interface specified by a very simple model, to wit a sequence and a
parameter collection. The parameter collection can be retrieved from the
Analysis module and parsed at run time to generate a suitable user
interface (for interactive use) or if the required parameters are known,
to be filled in automatically by the strategy. 

Should make a very nice strategy builder tool possible that allows one to
drag and drop a pipeline with each stage dynamically generating a config
window for each step.

My initial project was to allow interactive (from the user) assembly of a
results query to allow for visualisation of genomic data. 

..d

> 
> 	cheers,
> 	gerald
> 
> David Martin wrote:
> > 
> > Some while ago I started a project that is now on the back burner that was
> > designed to take generic analysis output and map it onto sequences.
> > 
> > There are a number of different aspects of geralds requests to consider:
> > 
> > A single value for a calculation is fine (eg gribskov stat, gc content, aa
> > content etc.). That can be represented quite easily by a generic 'property
> > value' object interface.
> > 
> > When you have other properties that relate to a sequence, such as AA
> > composition calculated in a sliding window over the sequence then you run
> > into problems. It is not a property of the whole sequence but a property
> > of a subsequence, often much larger than a single position in the
> > sequence.
> > 
> > One would probably want a heavier weight object than just a single
> > analysis. GC content for the whole sequence is a double and there isn't
> > much else one can add.
> > GC content ove a sliding window has a minimum of two parameters, one of
> > which varies over the sequence length.
> > 
> > If there was to be a generic interface for an analysis it should
> > probably return
> > some generic analysis object and then we start to head towards something
> > that looks like the analysis section of the OMG CORBA spec for
> > Biomolecular Sequence Analysis.
> > 
> > I would want an analysis to carry with it suitable information onthe
> > program, parameters and so on used to create the result. These can easily
> > be bundled into a fairly distinct set of analysis types (about 4 or 5)
> > that can be treated generically with the program parameters as a
> > Collection.
> > 
> > So we have a generic
> > SequenceAnalysis interface (probably really a result factory)
> > 
> > >From which we derive a variety of subtypes depending on the input sequence
> > type and return type
> > 
> > SingleValueAnalysis
> > takes a sequence and returns an analysis result with two components:
> > A parameter object of some sort and a value object of some sort.
> > 
> > ContinuousValueAnalysis
> > returns a result object that can give a value for every point in the
> > sequence. as well as holding its parameters
> > 
> > and so on.
> > Probably a bit heavier weight than Gerald had in mind.
> > 
> > Sorry to be so vague but it is late here, and I am adding a note from home
> > before I forget.
> > 
> > ..d
> > 
> > ---------------------------------------------------------------------
> > *  Dr. David Martin                  Biotechnology Centre of Oslo   *
> > *  Node Manager                      Gaustadalleen 21               *
> > *  The Norwegian EMBNet Node         P.O. box 1125 Blindern         *
> > *  tel +47 22 95 87 56               N-0317 Oslo                    *
> > *  fax +47 22 69 41 30               Norway                         *
> > ---------------------------------------------------------------------
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> -- 
>    Gerald.Loeffler@vienna.at _________________ Software Architect
>    http://www.imp.univie.ac.at ____ http://www.daemonstration.com
>    OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML
> 
> 

---------------------------------------------------------------------
*  Dr. David Martin                  Biotechnology Centre of Oslo   *
*  Node Manager                      Gaustadalleen 21               *
*  The Norwegian EMBNet Node         P.O. box 1125 Blindern         *
*  tel +47 22 95 87 56               N-0317 Oslo                    *
*  fax +47 22 69 41 30               Norway                         * 
---------------------------------------------------------------------