[DAS] RFC for feature data model
Matthew Pocock
matthew_pocock@yahoo.co.uk
Thu, 22 Aug 2002 23:16:04 +0100
Hi all,
There is some discussion on the biojava-dev list the moment about
changing our core feature/sequence model. It would be nice to be able to
work with gene objects totaly without the need for genomic data
available. Also, realy the same gene instance should be used regardless
of the coordinate system in place for the sequence you have hold of - if
you are viewing a contig, or a chromosome, or an embl dump of the
region. Potentialy, you could have a single LINE repeat object, and bind
it to the genome every place repeat masker calls a repeat. This
decouples the biological inheritance hierachy (or ontology) from the
sequence/location stuff, which is probably a good thing all round.
Matthew
The proposal
------------
Any format/data-model we use to annotate interesting regions of a
sequence should store all information necisary to mark a region of
sequence as being covered by some sort of feature entity (e.g. a list of
ranges - possibly 1 range element in length, and optional strand info)
and a link, id, URN/URI, ontology term or whatever giving the actual
feature at that location (gene, exon, etc. ad-nausium). A second service
may be used to resolve the link, id or whatever to the feature entity
itself.
Possible costs
--------------
1) double the number of transactions - one for region data and 1 for
feature objects.
2) writing and maintaining two services rather than 1.
Possible Soultions/Counter Arguments
------------------------------------
1) If the xml schemas are written sensibly, then inline rich objects
could be used interchaingably with linked-in rich objects.
2) We have to write the code to serialize/deserialize this info anyway -
all we're doing is giving the user another access point
Possible benefits
-----------------
1) the region handeling service becomes very simple and regular in the
info it serves (all complex fluffy objects are in the other service).
2) different data producers could link to different world views of
entity types fairly painlessly.
3) the entity service can be re-used in different bioinformatics domains
e.g. the genome entity services could be used totaly independantly of
chromosomal information for things like:
* GO editing
* annotating micro array spots
4) info relevant to the rich entity lives on that entity, info relevant
to it's instance or projection lives on the projection (e.g. you could
annotate the region with blast scores and link to the (ADH,human) gene
entity which contains rich annotation about ADH in human, and presumably
has links to both ADH and human if you want to find more stuff out.
5) the rich objects could be stored on a totaly different server,
allowing better reuse of complex concepts
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com