[Bioperl-l] Bio::Ontology overhaul
Hilmar Lapp
hlapp at gnf.org
Thu Feb 27 19:07:11 EST 2003
On Thursday, February 27, 2003, at 03:09 PM, Aaron J Mackey wrote:
>
> Fantastic, code examples ... examining in more detail:
>
> I. obtain an ontology term (and its parent ontology) from a seqfeature:
>
>> my $seqin = Bio::SeqIO->new( -format => 'embl');
>> while (my $seq = $seqin->next_seq()) ) {
>> foreach my $sf ( $seq->get_all_SeqFeatures() ) {
>> # unclear whether OntologyTermI's have an as_string method
>> print "Ooooh. It has ontology term ",$sf->type->as_string," from
>> ontology ",$sf->type->ontology->name,"\n";
>> }
>> }
Ontology::TermIs have a name (which would be your "as_string"):
$sf->type->name()
>
> Great. From this perspective, clearly an OntologyTermI has to somehow
> be
> able to get to their parent OntologyI; Lincoln, how would a
> OntologyHandleI help get around a "backref" to OntologyI from
> OntologyTermI in this case?
I thought about this possibility too but decided against it because it
creates yet another class. Maybe that's not a problem to anyone ... The
way this would help is *not* by obviating the memory cycle, but by
breaking the cycle automatically if the ontology (as represented by the
"handle") goes out of scope. Since you write it such that the "handle"
is not part of the cycle, it can be garbage collected. You hook into
that with a DESTROY on the "handle" and break the cycle in the real
ontology. $term->ontology() creates the "handle" on the fly (it can't
keep a pointer because otherwise the handle is part of the cycle).
So, there still is a memory cycle, but instead of the user to be
required to call $ontology->close() you initiate that automatically in
the handle's DESTROY.
What just occurs to me is what if someone didn't ever ask for the
ontology? Then the cycle will not be broken because no handle would go
out of scope. Also, what if a handle goes out of scope, but you're not
done yet with the ontology, i.e., you still have references to one or
more terms of the ontology? I'm confused now. Lincoln, what did I miss?
>
> II. Determine the "inheritance" of a term from another term (both
> obtained
> via seqfeatures)
>
>> if( $sf->type->is_child_of($anothersf->type) ) {
>> # do something
>> }
>
> Presumably, this gets passed off to the OntologyEngineI:
> sub OntologyTermI::is_child_of {
> my $self = shift;
> return $self->ontology->engine->is_child_of($self, @_)
> }
>
You can't issue ontology-based queries directly on the term. We made
decision a while back to keep terms as lean as possible.
$term->ontology() is already quite a stretch from that.
So, you'd do $term->ontology->is_child_of($term, $reltype) (which would
indeed internally delegate to the engine), if that method existed. It
doesn't :)
What you'd do is
sub is_child_of {
my ($ontology,$subject,$query,$reltype) = @_;
my @match = grep { $_->name() eq $subject->name(); }
$ontology->get_child_terms($query,$reltype);
return @match ? 1 : 0;
}
Now that people want to get more serious about ontologies, those may
want to revisit the present query capabilities defined in
OntologyEngineI (OntologyI just inherits those). We kept it
intentionally to a minimal degree, to allow specialized light-weight
implementations.
> Q. Does is_child_of do path traversal?
>
With depth 1, yes. If you're asking for any possible path, then
substitute get_child_terms() with get_descendant_terms in the above
code snippet.
> Q. Can is_child_of be given the "predicate" type [ i.e.
> $termA->is_child_of($termB, "isa") ]
Yes.
>
> Q. If "predicate" isn't given, what is assumed?
>
Wildcard.
> Q. What about two terms that are related through differing "predicates"
> (i.e A isa B, B partof C, what is the relationship between A and C?)
>
Depends on the relationship between "isa" and "partof". If they are
disjunct, then there is no path satisfying an AND of the two
relationship types.
> III. Same as II, but with an "anonymous" term:
>
>> # this might not be the right class for this
>> # Hilmar and Chris to agree
>> $ontology = Bio::Ontology::Factory->new( -ontology => 'SO');
>
> $gene = $ontology->get_term("gene");
>
> # ...
>
> if($sf->type->is_child_of($gene)) {
> # do something
> }
>
> BTW, why didn't this start as:
>
> $ontology = new Bio::Ontology 'SO';
So you want this to magically instantiate a fully populated ontology?
Could be added ("hardcoded" ontologies which are distributed with
Bioperl), but I'm not sure we should be doing this. Creates a
maintenance headache. I'd leave ontology distribution to ontology
maintainers ...
To populate ontologies you either have them in biosql or read them from
files:
$stream = Bio::OntologyIO->new(-format => 'so', -file => "so.file");
$ontology = $stream->next_ontology();
Note as an aside that -format 'so' is not implemented yet. There is
only 'go' and 'interpro'.
>
> or, if necessary:
>
> $ontology = new Bio::Ontology -name => 'SO',
> -factory => "MyFunkyOntologyFactory";
>
>
>
> Q. Should "is_child_of" be better as "is_subject_of" to reflect the
> subject/object/predicate ontology term paradigm?
>
> Q. Does is_subject_of need to know ontology namespace as well as
> applicable predicate term(s)?
>
The relationship type is-a Ontology::TermI and hence has an ontology()
method, so it's implicit of you constrain by relationship type. Now the
question is what do you do if relationship type matches by name but not
by namespace. This is why Matt and Thomas wanted the namespace also on
the relationship (I believe). You ask for the relationship between
"SOFA"::"isa" and "GO"::"isa". If it is an "isa" you decide whether you
trust it based on its namespace (and authority).
I apologize for my verbosity - I'm sitting in a cafe (named Wired, but
it's not wired).
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list