[DAS] discussion document for das/2
Matthew Pocock
mrp@sanger.ac.uk
Fri, 07 Dec 2001 12:17:30 +0000
Hi Lincoln,
DAML-OIL is quite cool, isn't it. There are some nasties out there to
get you (e.g. scoping of who can restrict a property), but at the very
least it gives us a common way to discuss the ideas of concepts, rather
like XML gives us a common way to discuss the ideas of data-structures.
I'm coming back to your Telomeric repeat class, as I think it
illustrates a point about our feature hierachy that hasn't realy been
clarrified. You don't have to agree with these definitions, that doesn't
realy matter for the sake of discussion. To me, a region is labelled a
repeat if it is homologous to many other regions (for suitably high
values of 'many'). The repeat instance is realy a region of alignment to
the repeat archetype. We tend to compress this down to saying that it is
an Alu repeat, or a poly-AT. Telomeres are funcitonal structures on the
ends of our chromosoes. Telomeric repeats are repeats that tend to be
associated with telomeres. Phew.
Now, in the feature ontologies we have discussed so far, there is only
one relationship - parent/child where the child is a subset of the
parent class. I think if we are to expose concepts like my Telomeric
repeat described above, we will need to consider different relationships
between features and feature types. For example, telomeric repeat could
be defined (in hokey semi-structured text) as:
'Alignment' tuple (Sequence, SequenceModel, Path)
'RepeatArchetype' subclassOf 'SequenceModel'
'Repeat' subclassOf 'LocatableFeature'
where exists Alignment(Repeat.sequence, RepeatArchetype)
'Telomere' subclassOf 'LocatableFeature'
'Telomeric Repeat' subclassOf 'Repeat'
where 'Telomeric Repeat'.RepeatArchetype subclassOf TelomericRepeatArchetype
'TelomericRepeatArchetype' subclassOf 'RepeatArchetype'
where most R in Alignment(R, 'TelomericRepeatArchetype', *) overlaps
Telomere
I know this is all a bit verbose, and on the surface we could just make
Telomeric Repeat inherit from both Telomere and Repeat, but realy that's
throwing away lots of the expressive power to capture the knowledge
people have about what they are trying to model. And anyway, a telomeric
repeat isn't realy the telomere itself, but most examples of telomeric
repeats are found within telomeres, and telomeres are often largely
constructed form telomeric repeats. This built-from relationship is
disjoint from the specialization-of relationship we have been
considering untill now. We could define a general relationship type
'Usualy associated with' that takes two feature classes and accepts the
pair if the second feature type usualy overlaps regions of the first
type. You can, of course, infer specialization-of relationships from
this richer description, and even bridge from this into another
world-view by defining an equivalence relationship between my an your
Telomeric Repeat class, so at the end of the day, it realy doesn't
matter if you agree with my classification scheim - we can all still
benefit from it.
I seem to have lost my point somewhere - I think I was trying to say
that super/sub-type is just one way to look at feature types, and not
necisarily the one that naturaly encapsulates the knowledge we want to
pass arround or query by. That sounds nice and succinct.
Matthew