[DAS] discussion document for das/2

Matthew Pocock mrp@sanger.ac.uk
Fri, 07 Dec 2001 12:17:30 +0000

Hi Lincoln,

DAML-OIL is quite cool, isn't it. There are some nasties out there to 
get you (e.g. scoping of who can restrict a property), but at the very 
least it gives us a common way to discuss the ideas of concepts, rather 
like XML gives us a common way to discuss the ideas of data-structures.

I'm coming back to your Telomeric repeat class, as I think it 
illustrates a point about our feature hierachy that hasn't realy been 
clarrified. You don't have to agree with these definitions, that doesn't 
realy matter for the sake of discussion. To me, a region is labelled a 
repeat if it is homologous to many other regions (for suitably high 
values of 'many'). The repeat instance is realy a region of alignment to 
the repeat archetype. We tend to compress this down to saying that it is 
an Alu repeat, or a poly-AT. Telomeres are funcitonal structures on the 
ends of our chromosoes. Telomeric repeats are repeats that tend to be 
associated with telomeres. Phew.

Now, in the feature ontologies we have discussed so far, there is only 
one relationship - parent/child where the child is a subset of the 
parent class. I think if we are to expose concepts like my Telomeric 
repeat described above, we will need to consider different relationships 
between features and feature types. For example, telomeric repeat could 
be defined (in hokey semi-structured text) as:

'Alignment' tuple (Sequence, SequenceModel, Path)

'RepeatArchetype' subclassOf 'SequenceModel'

'Repeat' subclassOf 'LocatableFeature'
where exists Alignment(Repeat.sequence, RepeatArchetype)

'Telomere' subclassOf 'LocatableFeature'

'Telomeric Repeat' subclassOf 'Repeat'
where 'Telomeric Repeat'.RepeatArchetype subclassOf TelomericRepeatArchetype

'TelomericRepeatArchetype' subclassOf 'RepeatArchetype'
where most R in Alignment(R, 'TelomericRepeatArchetype', *) overlaps 

I know this is all a bit verbose, and on the surface we could just make 
Telomeric Repeat inherit from both Telomere and Repeat, but realy that's 
throwing away lots of the expressive power to capture the knowledge 
people have about what they are trying to model. And anyway, a telomeric 
repeat isn't realy the telomere itself, but most examples of telomeric 
repeats are found within telomeres, and telomeres are often largely 
constructed form telomeric repeats. This built-from relationship is 
disjoint from the specialization-of relationship we have been 
considering untill now. We could define a general relationship type 
'Usualy associated with' that takes two feature classes and accepts the 
pair if the second feature type usualy overlaps regions of the first 
type. You can, of course, infer specialization-of relationships from 
this richer description, and even bridge from this into another 
world-view by defining an equivalence relationship between my an your 
Telomeric Repeat class, so at the end of the day, it realy doesn't 
matter if you agree with my classification scheim - we can all still 
benefit from it.

I seem to have lost my point somewhere - I think I was trying to say 
that super/sub-type is just one way to look at feature types, and not 
necisarily the one that naturaly encapsulates the knowledge we want to 
pass arround or query by. That sounds nice and succinct.