[Bioperl-l] common ancestor for ontology terms

Hilmar Lapp hlapp at gnf.org
Wed Apr 2 09:54:07 EST 2003


On Friday, March 28, 2003, at 11:19  AM, Chris Mungall wrote:

>
> what if there is a tie for the closest common ancestor? e.g. a subset 
> of a
> family tree with two siblings and their two parents. I think you really
> want to return a list of path-pairs.

Right. I noticed that but tabled resolving it.

>
> also, how do you define the 'shorter' path when what you have is
> path-pairs? do you just sum the two paths? this should be made 
> explicit.

Yes and yes. I did think of the sum, and it should indeed be made 
explicit.

> Or maybe if you could pass in a subroutine for calculating the 
> metric...

Would be nice, true.

>
> I don't think closures are necessarily required here (unless you're 
> doing
> it in SQL). I think it's much faster than O(V^2). You just zip up to 
> the
> root from both terms and then find the minimum distance term that is in
> both paths. Even in the most perverse graphs imaginable this doesn't 
> get
> above O(N), where N is the number of nodes in the graph.
>

Probably you're right. This would be the set-oriented approach. My 
brain was stuck in a double loop.

> Will there also be a method common_descendent_path()? What about
> specifying disjunctions of predicates? These would be less commonly 
> used,
> I admit. Allowing for these would make the interface rather complex.

What do you mean by disjunctions of predicates? What would be the 
context or kind of query where you would want to use this?

>
> What about the common ancestor of N terms? This could be useful,

Right, absolutely.

>  eg for
> trying to make sense of a blast hit to a GO sequence file or for mining
> clusters of co-expressed genes. Of course, this is bleeding over into
> bioinformatics-applications territory. Maybe its better to keep *all*
> these methods (including common_ancestor_path) out of the classes, and
> keep the object model focused on modeling the data. I'm not really sure
> what encapsulation buys you here. The only use I can think of is if you
> wanted to hide an in-memory/on-disk closure and use that in your
> calculation behind the scenes.

That was part of the idea, i.e., that an implementation may be able to 
most efficiently implement it, overriding the decorating implementation 
that only uses API methods.

>  But I don't think that's such a good idea.
> I guess what I'm suggesting is contrary to the spirit of bioperl and 
> OO in
> general, but I would like to see more seperation of data from 
> behaviour.
>

Actually, I'm all for this. This is in fact the direction I had in mind 
when separating ontology (Bio::Ontology::OntologyI) from ontology query 
engine (Bio::Ontology::OntologyEngineI). The former inherits from the 
latter, but that is for simplicity of use for the end user; the 
implementation (Bio::Ontology::Ontology) does use composition.

However, here the engine hosts the contents of the ontology (terms and 
rel.ships) too, instead of just operating on it. I do think though 
that's helpful at the end of the day to be able to provide efficient 
implementations. I may be wrong though.

	-hilmar

>
> On Thu, 27 Mar 2003, Hilmar Lapp wrote:
>
>> For the transitive closure computation we'll need to answer questions
>> like for two predicates A and B encountered along a path, is there a
>> common ancestor predicate that covers both. Per our definition, the
>> path would then receive the ancestor as predicate, and the path would
>> be void otherwise.
>>
>> This query also comes up in other contexts; e.g. for two feature types
>> A and B, what is the super-type.
>>
>> I'd be interested whether anyone (ChrisM? Biojava folks?) has had any
>> experience with working with and implementing this kind of query.
>> Generally speaking it appears to me it's a O(V^2) problem; using sets
>> and a transitive closure it could actually be solved pretty easily and
>> elegant (that's why in SQL it's easy to come up with a query that
>> answers this).
>>
>> I wrote a POD section (no implementation yet) and whether this makes
>> sense to people.
>>
>> 	-hilmar
>>
>> =head2 common_ancestor_path
>>
>>   Title   : common_ancestor_path
>>   Usage   :
>>   Function: Get the paths from two terms A and B to term C, such that
>>             there is no other term D to which A and B would have a
>> shorter
>>             path, provided there is a term C to which both A and B are
>>             connected by a path.
>>
>>             Note that the path to the common ancestor between A and A
>>             exists, has distance zero, and predicate "identity".
>>
>>             The search for the common ancestor C can be further
>>             constrained by supplying a predicate term. If supplied, 
>> the
>>             predicates of the two paths (A,C) and (B,C) must have a
>>             common ancestor identical to the predicate, or that has a
>>             path to the predicate.
>>
>>   Example :
>>   Returns : The path of the first term to the common ancestor in 
>> scalar
>>             context, and both paths in list context. Paths are
>>             Bio::Ontology::PathI compliant objects.
>>   Args    : The two terms (Bio::Ontology::TermI objects), and 
>> optionally
>>             a constraining common predicate (Bio::Ontology::TermI
>> object).
>>             The latter may also be given as a scalar, in which case it
>>             is treated as a boolean that, if TRUE, means that the two
>> paths
>>             must have identical predicates in order to be returned.
>>
>>
>> =cut
>>
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list