[DAS] Re: Our identifier doc and proposal
Matthew Pocock
mrp@sanger.ac.uk
Wed, 28 Nov 2001 17:48:58 +0000
Lincoln Stein wrote:
> I think we're going to find that the features form a DAG and not a
> hierarchy. Otherwise you're going to have problems classifying things
> like "genes". In the context of genetics, a gene is a type of
> complementation group. In the context of genomics, a gene is a
> subclass of transcription features, translation features, and
> regulatory features.
>
> Or what do we say about transposons? You can think of them in various
> contexts as: repeats, insertions, and pseudogenes.
>
> Lincoln
We can probably get arround this by seperating inheritance from
agregation, and by being explicit about context. To me, a gene is not a
transcrition feature or a translated feature, but an object that is
associated with some of these by specific relations (or functions or
properties or methods...). My reading of ontologies (though I may be
wrong) is that data is a member of a classification if you can get to
the right sort of data from it. For example, a transcribed feature
(presumably in genomic coordinates) is defined as any feature which has
a relation from it to a transcript entity (presumably the relation
models DNA->RNA, and the transcript is in non-genomic co-ordinates).
This may not sit well with strongly-typed languages like Java if you
want to directly instantiate class hierachies from ontologies. Below is
an incomplete description of the data you want to model using a stupid
semi-structured text format I just thought up. For those of you who like
going mad, you may want to look at:
http://plato.stanford.edu/entries/category-theory/
Matthew
namespace: genetics {
genome: {
description: "The entire bag of genes for an organism"
}
concept: gene{
description: "The unit of inheritance that belongs to a
complementation group"
}
concept: allele {
description: "One possible variant of a gene"
}
concept: phenotype {
description: "Something we can see in an organism that appears to
be geneticaly controled"
}
relation genome_has_gene = (genome(*), gene(*))
relation gene_has_allele = (gene(1), allele(*))
relation allele_has_phenotype (allele(*), phenotype(*)
}
namespace: genomics {
concept: feature {
property: location(1) instanceof dnaLocation
property: strand(1) from [+1, -1, 0]
}
concept: transcript isa feature {
description: "A region of DNA that can be transcribed"
}
concept: promoter isa feature {
description: "A region of DNA that causes transcription"
}
concept: gene isa feature {
description: "A Gene. The regulatory and coding units within the
genome that modify organism behavior"
property promoter(*) instanceof promoter {
restriction: gene_has_promoter(gene, promoter) = true
}
property transcript(*) instanceof transcript {
restriction: gene_has_transcript(gene(transcript) = true
}
}
relation subFeature = (feature(*) as parent, feature(*) as child) {
contains(parent->location, child->location)
}
relation gene_has_transcript isa subFeature = (gene(1), transcript(*)) {
description: "A given gene may have multiple transcripts, but a
transcript can belong to only one gene"
}
relation gene_has_promoter isa subFeature = (gene(*), promoter(*)) {
description: "genes can share promoters, and a given gene may have
multiple promoters"
restriction
}
concept: transcribed_feature isa feature {
restriction: sub_feature(transcribed_feature, transcript) !=
empty_set {
description: "e.g. transcribed_feature objects must have at least
one transcript"
}
}
concept: repeat_type
relation: feature_is_repeat = (feature(*), repeat_type(1)) {
description: "Given a feature and a repeat_type, we can decide if
the feature is similar enough to the archetype for this repeat type e.g.
by profile search. If so, it is a repeat of this type."
}
concept: repeat isa feature {
restriction: size(feature_is_repeat(repeat, repeat_type)) >
min_copy_number
object: min_copy_number = 10
}
}
obj1 isa genetics:gene
obj isa genomics:gene // only true if obj isa feature and
// all relations involving gene arguments
// hold
obj->genomics:feature.location // fetch the single location for this feature
obj isa genomics:transcribed_feature // obj isa feature
// has at least one transcript
// will be true for genes with transcripts
obj isa genomics:repeat // true if obj isa feature and is accepted by some
// repeat_type. It may also be a transcribed_feature or
// a gene. There will be at least 10 features with the
// same repeat_type as this object.
obj->genomics:gene.promoter // goes to set of all promoters for the gene