[Open-bio-l] Re: [GMOD-devel] RE: Schema for genes & features & mappings to assemblies

Lincoln Stein lstein@cshl.org
Thu, 25 Apr 2002 15:00:15 -0400


> 3) GGB can run off any schema for which one writes an adaptor for
> Bio::DasI (caveat: the tag/values must not break the logic in the given
> aggregators, or one has to provide one's own; Lincoln is that all roughly
> correct?)

That's right.  The aggregators are independent of the database adaptor, so if 
the tag ontology changes, you'll have to modify the aggregators.

> Aggregation on the software layer is one way of
> implementing a view on the model; sadly enough MySQL completely dismissed
> the concept of models and views, but with Oracle you can implement any view
> you want in the database layer.

Chris demonstrated the utility of views very nicely in his last letter, IMO.

> What we (we refering to our group here) will need for assemblies is to
> represent existing ones such that we can stick in all the mappings (of
> features, genes, markers, etc). I thought that would then underpin all
> mapped entities with a sequence; i.e., in order to obtain a feature's
> sequence you need to specify the feature /and/ the assembly (assuming you
> have a mapping for that assembly); this means a gene's CDS sequence is
> going to be different from one assembly to another. The open question is
> whether r not you still need a fixed sequence for that feature (e.g., in
> order to map it). Does this make some sense or sound like a stupid idea?

I don't understand this one.  If you've annotated a CDS on one assembly, and 
a second assembly changes the underlying sequence, is it valid to map the CDS 
from one assembly onto the new one?  Surely you would rather want to 
reannotate that area and call a new CDS.

> As for the ideas that were mentioned I'm not sure how we (GNF) would want
> to exploit an n-depth representation of nested contigs, but others may well
> do so (as a remote idea, could you use that for in-silico SNP detection?).
> I disagree with Ewan's stance that alone the possibility of nested
> assemblies necessarily would require an application to handle that: you
> could just test for a flat assembly and exit gracefully if it's nested and
> you can't handle that. The impact of not allowing something that some eople
> need (or want) appears to be worse to me.

Someone suggested earlier that if a 1-level client tried to interact with an 
n-level data store, that the 1-level client would only get those features 
that are on the level it requested.  I'm ambivalent about this idea, since it 
might be better for the data store to refuse than for it to give incomplete 
data.

> The zero-level approach sounds appealing to me; but wouldn't that require
> that the chromosome lengths be all known?

In the zero-level (or 1-level) approach, the top level only goes as high as 
what is currently in the assembly.  If the assembly stops at super-contigs, 
and we don't know how the super-contigs go together, then those define the 
top-level coordinates.

>
> 	-hilmar