[DAS] RFC for feature data model
David Block
dblock@gnf.org
Thu, 22 Aug 2002 16:35:21 -0700
This looks kind of similar to SymGene's 'gene-centric' database schema.
In fact, we wanted to decouple location from features, so that a feature
could exist on multiple assemblies, and so that users could traverse
links between interesting features without ever consulting a genomic
sequence.
We are also able to hang evidence on the relationships between
entities - so Gene A is orthologous to Gene B, as evidenced by paper C.
The evidence would otherwise be awkwardly linked to both Gene A and Gene
B. This makes the assertion it is supporting more explicit.
Just my $.02...
On Thursday, August 22, 2002, at 03:16 PM, Matthew Pocock wrote:
> Hi all,
>
> There is some discussion on the biojava-dev list the moment about
> changing our core feature/sequence model. It would be nice to be able
> to work with gene objects totaly without the need for genomic data
> available. Also, realy the same gene instance should be used regardless
> of the coordinate system in place for the sequence you have hold of -
> if you are viewing a contig, or a chromosome, or an embl dump of the
> region. Potentialy, you could have a single LINE repeat object, and
> bind it to the genome every place repeat masker calls a repeat. This
> decouples the biological inheritance hierachy (or ontology) from the
> sequence/location stuff, which is probably a good thing all round.
>
> Matthew
>
> The proposal
> ------------
> Any format/data-model we use to annotate interesting regions of a
> sequence should store all information necisary to mark a region of
> sequence as being covered by some sort of feature entity (e.g. a list
> of ranges - possibly 1 range element in length, and optional strand
> info) and a link, id, URN/URI, ontology term or whatever giving the
> actual feature at that location (gene, exon, etc. ad-nausium). A second
> service may be used to resolve the link, id or whatever to the feature
> entity itself.
>
> Possible costs
> --------------
> 1) double the number of transactions - one for region data and 1 for
> feature objects.
> 2) writing and maintaining two services rather than 1.
>
> Possible Soultions/Counter Arguments
> ------------------------------------
> 1) If the xml schemas are written sensibly, then inline rich objects
> could be used interchaingably with linked-in rich objects.
> 2) We have to write the code to serialize/deserialize this info
> anyway - all we're doing is giving the user another access point
>
> Possible benefits
> -----------------
> 1) the region handeling service becomes very simple and regular in the
> info it serves (all complex fluffy objects are in the other service).
> 2) different data producers could link to different world views of
> entity types fairly painlessly.
> 3) the entity service can be re-used in different bioinformatics
> domains e.g. the genome entity services could be used totaly
> independantly of chromosomal information for things like:
> * GO editing
> * annotating micro array spots
> 4) info relevant to the rich entity lives on that entity, info relevant
> to it's instance or projection lives on the projection (e.g. you could
> annotate the region with blast scores and link to the (ADH,human) gene
> entity which contains rich annotation about ADH in human, and
> presumably has links to both ADH and human if you want to find more
> stuff out.
> 5) the rich objects could be stored on a totaly different server,
> allowing better reuse of complex concepts
>
> -- BioJava Consulting LTD - Support and training for BioJava
> http://www.biojava.co.uk
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
>
> _______________________________________________
> DAS mailing list
> DAS@biodas.org
> http://biodas.org/mailman/listinfo/das
>
--
David Block dblock@gnf.org
GNF - San Diego, CA http://www.gnf.org
Genome Informatics / Enterprise Programming
Weblog: http://radio.weblogs.com/0104507/