[Bioperl-l] common ancestor for ontology terms
Chris Mungall
cjm at fruitfly.org
Fri Mar 28 11:19:33 EST 2003
what if there is a tie for the closest common ancestor? e.g. a subset of a
family tree with two siblings and their two parents. I think you really
want to return a list of path-pairs.
also, how do you define the 'shorter' path when what you have is
path-pairs? do you just sum the two paths? this should be made explicit.
Or maybe if you could pass in a subroutine for calculating the metric...
I don't think closures are necessarily required here (unless you're doing
it in SQL). I think it's much faster than O(V^2). You just zip up to the
root from both terms and then find the minimum distance term that is in
both paths. Even in the most perverse graphs imaginable this doesn't get
above O(N), where N is the number of nodes in the graph.
Will there also be a method common_descendent_path()? What about
specifying disjunctions of predicates? These would be less commonly used,
I admit. Allowing for these would make the interface rather complex.
What about the common ancestor of N terms? This could be useful, eg for
trying to make sense of a blast hit to a GO sequence file or for mining
clusters of co-expressed genes. Of course, this is bleeding over into
bioinformatics-applications territory. Maybe its better to keep *all*
these methods (including common_ancestor_path) out of the classes, and
keep the object model focused on modeling the data. I'm not really sure
what encapsulation buys you here. The only use I can think of is if you
wanted to hide an in-memory/on-disk closure and use that in your
calculation behind the scenes. But I don't think that's such a good idea.
I guess what I'm suggesting is contrary to the spirit of bioperl and OO in
general, but I would like to see more seperation of data from behaviour.
On Thu, 27 Mar 2003, Hilmar Lapp wrote:
> For the transitive closure computation we'll need to answer questions
> like for two predicates A and B encountered along a path, is there a
> common ancestor predicate that covers both. Per our definition, the
> path would then receive the ancestor as predicate, and the path would
> be void otherwise.
>
> This query also comes up in other contexts; e.g. for two feature types
> A and B, what is the super-type.
>
> I'd be interested whether anyone (ChrisM? Biojava folks?) has had any
> experience with working with and implementing this kind of query.
> Generally speaking it appears to me it's a O(V^2) problem; using sets
> and a transitive closure it could actually be solved pretty easily and
> elegant (that's why in SQL it's easy to come up with a query that
> answers this).
>
> I wrote a POD section (no implementation yet) and whether this makes
> sense to people.
>
> -hilmar
>
> =head2 common_ancestor_path
>
> Title : common_ancestor_path
> Usage :
> Function: Get the paths from two terms A and B to term C, such that
> there is no other term D to which A and B would have a
> shorter
> path, provided there is a term C to which both A and B are
> connected by a path.
>
> Note that the path to the common ancestor between A and A
> exists, has distance zero, and predicate "identity".
>
> The search for the common ancestor C can be further
> constrained by supplying a predicate term. If supplied, the
> predicates of the two paths (A,C) and (B,C) must have a
> common ancestor identical to the predicate, or that has a
> path to the predicate.
>
> Example :
> Returns : The path of the first term to the common ancestor in scalar
> context, and both paths in list context. Paths are
> Bio::Ontology::PathI compliant objects.
> Args : The two terms (Bio::Ontology::TermI objects), and optionally
> a constraining common predicate (Bio::Ontology::TermI
> object).
> The latter may also be given as a scalar, in which case it
> is treated as a boolean that, if TRUE, means that the two
> paths
> must have identical predicates in order to be returned.
>
>
> =cut
>
>
More information about the Bioperl-l
mailing list