[Dynamite] Is this working now then?

Ewan Birney birney@ebi.ac.uk
Sun, 5 Mar 2000 20:18:20 +0000 (GMT)


On Sat, 4 Mar 2000, Ian Holmes wrote:

> > > > 	b) sequence object as a separate module (yeah!) but then 
> > > > we need methods always to access it, meaning possibly internal
> > > > sequence objects in some modules (yuk)
> 
> Come to think of it..
> 
> we don't even need methods if it's just a data structure - right?

Well - there in is the rub. 

I think we are heading towards "doing something different" (option c)
in my last mail. I am happy to go down this route if people think this
is the best way.

The drawback of declaring things as datastructures is that it enforces
implementations which potentially can delay aspects of the retrieval
of sequence objects to forcing to build all the data members up-front.
This can be a real pain if your "sequence object" is in fact a wrapper
of a gene prediction which span virtual contigs in a large database. To
get the sequence, some pretty heavy DB interaction has to go on.

There are alot of cases where people might pick up the sequence object
just to get its name and/or part of the sequence, and we want
to be able not to require everything to be there straight away.

An alternative is the following design pattern (speaking IDL) :

module Sequence {

	// structs map to sized data structures in the
	// object.h 	
	struct Seq_str {
		string seq;
		string display_id;
		string accession_number;
		string primary_id; // could be called internal_id
	};

	interface Seq {
		attribute string seq;
		attribute string display_id;
		attribute string accession_number;
		attribute string primary_id;

		// so we can do smart things about getting sub-strings		
		string get_subseq(in long start,in long end);

		// get everything in one go
		Seq_str get_str();
	};

}


Interestingly, this is the CORBA design pattern for this, which is
meant to be replaced by object-by-value which does this "for free"
supposedly (except that the people in the know think the specification
sucks...).

Finally from my knowledge of interface writting now at Ensembl and
Bioperl, we can do something where the interface definition actually
is very clean but does not sacrifice the ability to get the data
structure out. ie, in IDL terms

module Sequence {

	// structs map to sized data structures in the
	// object.h 	
	struct Seq_str {
		string seq;
		string display_id;
		string accession_number;
		string primary_id; // could be called internal_id
	};

	// Foreign_Seq is written by someone wishing to 
	// provide a sequence.
	interface Foreign_Seq {
		attribute string seq;
		attribute string display_id;
		attribute string accession_number;
		attribute string primary_id;

		// so we can do smart things about getting sub-strings		
		string get_subseq(in long start,in long end);
	};

	// External_Seq is provided by a module written by ourselves.
	// it cache's the data members where appropiate and builds
	// a Seq_str 

	interface External_Seq : Foreign_Seq {
		Seq_str get_Seq_str();
	};

	// Factory method guranteed by the module

	interface External_Seq_Factory {
		External_Seq from_Foreign_Seq(in Foreign_Seq seq);
	};
}

This all seems very heavy handed, but the point is we either have to

	a) discard any idea of sequences being more than a completely
exposed datastructure, with no possibilities of placing smarts behind it.
(I think this is **bad**. We want to put these around DB handles
sometime).

	b) accept that sequences are proper first class objects
that have to be accessed only through methods.

	c) use one of the design patterns outlined above, or something
similar to provide both views validly.


I was suggesting going for b) until we know this does not work, but I
am happy to go for c). a) is my least favoured choice.


There is no clean solution to this. :(.


> 
> At the risk of treading old ground here..
> 
> I vote for a lightweight sequence data structure containing two strings:
> name and sequence data. This accession number stuff has nothing to do with
> dynamic programming really. Besides -- having three different kinds of ID
> with apparently nothing to distinguish them is somewhat idiosyncratic.
> 
> Ian
> 
> 
> _______________________________________________
> Dynamite mailing list  -  Dynamite@bioperl.org
> http://www.bioperl.org/mailman/listinfo/dynamite
>