[DAS] RFC for feature data model

David Block dblock@gnf.org
Fri, 23 Aug 2002 10:13:47 -0700


On Friday, August 23, 2002, at 12:54 AM, Chris Mungall wrote:

>
> I like the decoupling. but I think we have to be careful about cases 
> when
> data should be attached to the floating Gene entity, and when it should 
> be
> attached to the Gene instance-on-a-sequence.

We have full-blown location objects.  This means that we can attach 
annotation anywhere we wish.  In fact, we are attaching locations only 
to transcripts, and I expect most annotation to go there, but some gene 
expression data, etc., will not be transcript specific depending on the 
probes used, etc., so some annotation will go on the gene.  It will be 
the job of the middleware to traverse the hierarchy of entities and give 
all the relevant annotation to the user-  which will be fun!


>
> For instance, it's always useful to have a gene-level summary of
> information such as function and cellular localisation that applies to 
> all
> spliceforms / wild type forms, often you want to attach this information
> at the instance-on-a-sequence level. For example, different products 
> have
> different functions.
>
> The way we're thinking about this with the new flybase schema (correct 
> me
> if I'm wrong, Dave) is like this:
>
> Gene                                                        SET
>  |
>  +--- GeneStructure aka allele                              INSTANCE
>            |
>            +---- Transcript                                 INSTANCE
>                      |
>                      +--------- Exon, Translation etc       INSTANCE
>
Okay, we will make Gene and Transcript "Floating Entities" with zero or 
more Location objects, and then Exons and other sub-gene pieces will be 
simple SeqFeatures.  I think that's more flexible for us, since we're 
dealing with multiple assemblies.

<snip/>
> DAS itself doesn't care, the client just fires off different content
> handlers / xslt for different namespaces.
>
> I think this is roughly equivalent to what Matt was suggesting; i just 
> see
> it as less of a decoupling as often the SeqFeatures (instances) are the
> biological objects themselves, they can't always be viewed independently
> of their location/sequence.
>
Our das services are likely to be on the level of transcripts and 
smaller - genes will only be accessible from transcript annotation.
--
David Block                                  dblock@gnf.org
GNF - San Diego, CA             http://www.gnf.org
Genome Informatics / Enterprise Programming
Weblog:      http://radio.weblogs.com/0104507/