[DAS] Re: Our identifier doc and proposal

Matthew Pocock mrp@sanger.ac.uk
Wed, 28 Nov 2001 17:48:58 +0000


Lincoln Stein wrote:

> I think we're going to find that the features form a DAG and not a
> hierarchy.  Otherwise you're going to have problems classifying things
> like "genes".  In the context of genetics, a gene is a type of
> complementation group.  In the context of genomics, a gene is a
> subclass of transcription features, translation features, and
> regulatory features.
> 
> Or what do we say about transposons?  You can think of them in various
> contexts as: repeats, insertions, and pseudogenes.
> 
> Lincoln

We can probably get arround this by seperating inheritance from 
agregation, and by being explicit about context. To me, a gene is not a 
transcrition feature or a translated feature, but an object that is 
associated with some of these by specific relations (or functions or 
properties or methods...). My reading of ontologies (though I may be 
wrong) is that data is a member of a classification if you can get to 
the right sort of data from it. For example, a transcribed feature 
(presumably in genomic coordinates) is defined as any feature which has 
a relation from it to a transcript entity (presumably the relation 
models DNA->RNA, and the transcript is in non-genomic co-ordinates). 
This may not sit well with strongly-typed languages like Java if you 
want to directly instantiate class hierachies from ontologies. Below is 
an incomplete description of the data you want to model using a stupid 
semi-structured text format I just thought up. For those of you who like 
going mad, you may want to look at:

http://plato.stanford.edu/entries/category-theory/

Matthew



namespace: genetics {
   genome: {
     description: "The entire bag of genes for an organism"
   }

   concept: gene{
     description: "The unit of inheritance that belongs to a 
complementation group"
   }

   concept: allele {
     description: "One possible variant of a gene"
   }

   concept: phenotype {
     description: "Something we can see in an organism that appears to 
be geneticaly controled"
   }

   relation genome_has_gene = (genome(*), gene(*))
   relation gene_has_allele = (gene(1), allele(*))
   relation allele_has_phenotype (allele(*), phenotype(*)
}

namespace: genomics {
   concept: feature {
     property: location(1) instanceof dnaLocation
     property: strand(1) from [+1, -1, 0]
   }

   concept: transcript isa feature {
     description: "A region of DNA that can be transcribed"
   }

   concept: promoter isa feature {
     description: "A region of DNA that causes transcription"
   }

   concept: gene isa feature {
     description: "A Gene. The regulatory and coding units within the 
genome that modify organism behavior"
     property promoter(*) instanceof promoter {
       restriction: gene_has_promoter(gene, promoter) = true
     }

     property transcript(*) instanceof transcript {
       restriction: gene_has_transcript(gene(transcript) = true
     }
   }

   relation subFeature = (feature(*) as parent, feature(*) as child) {
     contains(parent->location, child->location)
   }

   relation gene_has_transcript isa subFeature = (gene(1), transcript(*)) {
     description: "A given gene may have multiple transcripts, but a 
transcript can belong to only one gene"
   }

   relation gene_has_promoter isa subFeature = (gene(*), promoter(*)) {
     description: "genes can share promoters, and a given gene may have 
multiple promoters"
     restriction
   }

   concept: transcribed_feature isa feature {
     restriction: sub_feature(transcribed_feature, transcript) != 
empty_set {
       description: "e.g. transcribed_feature objects must have at least 
one transcript"
     }
   }

   concept: repeat_type

   relation: feature_is_repeat = (feature(*), repeat_type(1)) {
     description: "Given a feature and a repeat_type, we can decide if 
the feature is similar enough to the archetype for this repeat type e.g. 
by profile search. If so, it is a repeat of this type."
   }

   concept: repeat isa feature {
     restriction: size(feature_is_repeat(repeat, repeat_type)) > 
min_copy_number
     object: min_copy_number = 10
   }
}

obj1 isa genetics:gene

obj isa genomics:gene // only true if obj isa feature and
                       // all relations involving gene arguments
                       // hold

obj->genomics:feature.location // fetch the single location for this feature

obj isa genomics:transcribed_feature // obj isa feature
                             // has at least one transcript
                             // will be true for genes with transcripts

obj isa genomics:repeat // true if obj isa feature and is accepted by some
                // repeat_type. It may also be a transcribed_feature or
                // a gene. There will be at least 10 features with the
                // same repeat_type as this object.

obj->genomics:gene.promoter // goes to set of all promoters for the gene