[Bioperl-l] Bio::Ontology overhaul

Wed Feb 26 18:52:57 EST 2003

On Wednesday, February 26, 2003, at 07:07  PM, Chris Mungall wrote:

>
> On Wed, 26 Feb 2003, Hilmar Lapp wrote:
>
>> 3) Add a method ontology() to TermI that accepts and returns a object
>> implementing Bio::Ontology::OntologyI. Remove method category() (I
>> added an implementation to Term.pm that ensures backward 
>> compatibility).
>>
>> This is a controversial thing to do because it almost inevitably
>> creates memory cycles (the term points to the ontology which points to
>> its terms). Calling OntologyI::close() is required to break the cycle.
>> I thought about this for the last couple days and finally decided that
>> for usability's sake this is probably the right thing to do
>> nevertheless. Here are my reasons.
>
> eek! scary

It is -- but then I thought about that old motto: do one thing every 
day that scares you :) (that's in the trust me on the sunscreen speech, 
isn't it?)

>
>> 	- the only way to copy the PrimarySeq/Seq/SeqFeatureI pattern loses
>> half of its usefulness (you want name *and* query engine accessible 
>> for
>> it to be really useful)
>>
>>      - most if not all people are going to use only a few ontologies
>> during any given runtime, not hundreds of thousands like for features
>> and sequences, and those few ontologies you will want in memory anyway
>>
>>      - you can break all the cycles by a single call to one designated
>> method on an ontology, which IMHO is not asking for that much
>
> are you absolutely sure this will work?
>

With the current implementation it is relatively easy to make it 
foolproof - essentially all that needs to be done is the Ontology 
object to dissociate from the query engine (undef'ing a single variable 
in a single object). Breaking the cycle is on the Ontology, not the 
terms.

> I think this will force a lot of work onto the API user - they will 
> have
> to make sure that they no longer have other objects in memory with a
> reference path to the ontology being closed.
>
> it also means that someone who mistakenly does this:
>
> foreach (@entities) {
>   my $ont = $factory->create_ontology();
>   ...
> }
>
> instead of this:
>
> my $ont = $factory->create_ontology();
> foreach (@entities) {
>   ...
> }
>
> could easily be hosed

That's right - if you do stupid things with this you're going to be 
hosed.

>
> we should also think ahead to the future - will your plan work if we
> decide to make our ontologies more OWL/OIL/Protege like and add
> instances?

Not sure what the implications would be. I'm no expert here ... please 
share your insight if you see a problem.

>  what about applications that do things such as generating cross
> products between terms, creating potentially huge amount of terms.

Creating more terms is not a problem that is influenced by the presence 
or absence of cycles since it doesn't change the memory load imposed by 
the number of terms, I think (maybe I'm missing something though). 
Since you break the cycle on the ontology object, the number of terms 
doesn't matter. The implementation does *not* loop over all term 
objects and do something with every term - it just sets its query 
engine to undef, which is the one holding pointers to the terms. The 
terms point to the ontology, not the query engine - hence no cycle if 
no query engine referenced by the ontology.

>  It may
> well be doable in a scalable way but I just think adding circular
> references makes it harder and potentially easier for us to get into a
> a nasty situation
>
>>      - having the ontology() method just return a plain string or a 
>> dumb
>> namespace object is clunky and has very limited if any usability
>> outside of e.g. bioperl-db; in contrast, being able to get at the full
>> featured ontology with all the query methods by calling a method on 
>> any
>> given term is potentially very useful
>
> i disagree - i say make the term and relationship objects dumb and put 
> all
> the methods in the ontology class. makes it way simpler.

I guess it's a matter of taste. The biojava guys have the ontology 
accessible through the term - since cycles are a non-issue in java. So 
I guess if you can have it many people may want to have it.

But - if the majority of people agrees with you, then the problem is 
solved.

Guys/gals on the list, please vote: ontology accessible from a term, or 
the plain name of the ontology suffices?

>
>> 	- as Matt pointed out to me correctly, it is in fact possible to come
>> up with a query engine implementation that avoids the cycles 
>> altogether
>> by constructing term and relationship objects on the fly from raw hash
>> or array refs when such objects are requested. Given the design that
>> I'm proposing, it is very easy to plug that in once somebody writes it
>> (call
>> $ontology->engine($my_engine_without_term_objects).
>
> so if I ask for the same term 1000 times, there will actually be 1000
> objects in memory rather than 1?

In that case, and if you assign all 1000 to different variables that 
are still in scope or accumulate them in an array or hash, then yes, 
you'd have 1000 objects instead of 1. Scary?

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------