[Dynamite] Is this working now then?
Ian Holmes
ihh@fruitfly.org
Sun, 5 Mar 2000 09:23:02 -0800 (PST)
On Sun, 5 Mar 2000, Ewan Birney wrote:
>
>
> On Sat, 4 Mar 2000, Ian Holmes wrote:
>
> > > > > b) sequence object as a separate module (yeah!) but then
> > > > > we need methods always to access it, meaning possibly internal
> > > > > sequence objects in some modules (yuk)
> >
> > Come to think of it..
> >
> > we don't even need methods if it's just a data structure - right?
>
> Well - there in is the rub.
>
> I think we are heading towards "doing something different" (option c)
> in my last mail. I am happy to go down this route if people think this
> is the best way.
>
> The drawback of declaring things as datastructures is that it enforces
> implementations which potentially can delay aspects of the retrieval
> of sequence objects to forcing to build all the data members up-front.
> This can be a real pain if your "sequence object" is in fact a wrapper
> of a gene prediction which span virtual contigs in a large database. To
> get the sequence, some pretty heavy DB interaction has to go on.
>
> There are alot of cases where people might pick up the sequence object
> just to get its name and/or part of the sequence, and we want
> to be able not to require everything to be there straight away.
>
> An alternative is the following design pattern (speaking IDL) :
>
> module Sequence {
>
> // structs map to sized data structures in the
> // object.h
> struct Seq_str {
> string seq;
> string display_id;
> string accession_number;
> string primary_id; // could be called internal_id
> };
>
> interface Seq {
> attribute string seq;
> attribute string display_id;
> attribute string accession_number;
> attribute string primary_id;
>
> // so we can do smart things about getting sub-strings
> string get_subseq(in long start,in long end);
>
> // get everything in one go
> Seq_str get_str();
> };
>
> }
>
>
> Interestingly, this is the CORBA design pattern for this, which is
> meant to be replaced by object-by-value which does this "for free"
> supposedly (except that the people in the know think the specification
> sucks...).
>
> Finally from my knowledge of interface writting now at Ensembl and
> Bioperl, we can do something where the interface definition actually
> is very clean but does not sacrifice the ability to get the data
> structure out. ie, in IDL terms
>
> module Sequence {
>
> // structs map to sized data structures in the
> // object.h
> struct Seq_str {
> string seq;
> string display_id;
> string accession_number;
> string primary_id; // could be called internal_id
> };
>
> // Foreign_Seq is written by someone wishing to
> // provide a sequence.
> interface Foreign_Seq {
> attribute string seq;
> attribute string display_id;
> attribute string accession_number;
> attribute string primary_id;
>
> // so we can do smart things about getting sub-strings
> string get_subseq(in long start,in long end);
> };
>
> // External_Seq is provided by a module written by ourselves.
> // it cache's the data members where appropiate and builds
> // a Seq_str
>
> interface External_Seq : Foreign_Seq {
> Seq_str get_Seq_str();
> };
>
> // Factory method guranteed by the module
>
> interface External_Seq_Factory {
> External_Seq from_Foreign_Seq(in Foreign_Seq seq);
> };
> }
>
> This all seems very heavy handed, but the point is we either have to
>
> a) discard any idea of sequences being more than a completely
> exposed datastructure, with no possibilities of placing smarts behind it.
> (I think this is **bad**. We want to put these around DB handles
> sometime).
>
> b) accept that sequences are proper first class objects
> that have to be accessed only through methods.
>
> c) use one of the design patterns outlined above, or something
> similar to provide both views validly.
>
>
> I was suggesting going for b) until we know this does not work, but I
> am happy to go for c). a) is my least favoured choice.
>
I think this is fine in principle. I believe your Sequence::Seq_str is
called a "Memo" pattern by Gamma et al.
Ian
>
> There is no clean solution to this. :(.
>
>
> >
> > At the risk of treading old ground here..
> >
> > I vote for a lightweight sequence data structure containing two strings:
> > name and sequence data. This accession number stuff has nothing to do with
> > dynamic programming really. Besides -- having three different kinds of ID
> > with apparently nothing to distinguish them is somewhat idiosyncratic.
> >
> > Ian
> >
> >
> > _______________________________________________
> > Dynamite mailing list - Dynamite@bioperl.org
> > http://www.bioperl.org/mailman/listinfo/dynamite
> >
>
>