[BioSQL-l] ontology for transitive closure table

Aaron J Mackey ajm6q at virginia.edu
Thu Mar 27 15:38:47 EST 2003


On Thu, 27 Mar 2003, Hilmar Lapp wrote:

> >> my $dbigraph = new DBIx::Graph @params;
> >> my $graph = $dbigraph->graph;
> >>
> >> my $tc = $graph->transitive_closure;
> >
> > So this bit now works for me (including being able to access all
> > various
> > "paths" described by a given transitive closure edge, useful for
> > determining predicate type of the new edge).
>
> what is $tc? a reference to an array?

$tc is a Graph::Directed object.

> Do you have some place where one can checkout the code?

No, not yet (dbix-graph is on sourceforge, but is currently empty)

> Also, Graph.pm
> has
>
>                 $G->has_path($u, $v, ...)
>
>         Return true if the graph $G has the cycle defined by the
>         vertices $u, $v, ..., false otherwise.
>
> Didn't this help? Maybe it didn't because you need to enumerate all
> vertices along the path then? (I haven't used that method)

No, that doesn't help with the speed issue.

> How do you deal with paths that have heterogeneous predicates, i.e.,
> both is-a and part-of? In Singapore we defined that the predicate of
> the path is the GCD, but there is no common denominator defined yet for
> those two predicates. Did you define one behind the scenes (like
> related-to)?

The changes I've made to Graph::Directed are to simply store the set of
vertices that were traversed (i.e. the actual "path") to generate a new
edge in the transitive closure; of course, there may be multiple paths,
and they're all available to you.  You may now do whatever introspection
you like to "label" the edges as you wish.  This was basic step #1 to get
to what you're talking about (which may involve looking at all possible
paths between A and B, and deciding a) whether the edge should exist at
all and b) which "labels" the edge should receive).  So a
Bio::Whatever::RelationshipI manager might use DBIx::Graph and
Graph::Directed to a) fetch the vertices and edges from a database, and b)
compute the "raw" transitive closure and would then go off and
"post-process" the tc graph before saying "I'm done, save this away".

> >> Slightly OT, but if I were to implement our own Graph.pm-like object
> >> via
> >> an Inline::C-wrapped graph library, would people be keen on that?

> The problem is its more a hassle for you than for people, because if
> someone can't compile the code on his specific platform, or hits a
> dynlink error, or hits a segfault, then most likely it is going to be
> you chasing down subtle allocation/de-allocation bugs, or stupid
> OS/perl version differences or whatever.

I'll remind you of what I said:

> >   But, computing the tc via
> > Graph::Directed on GO isa/partof is extraordinarily slow (I've waited
> > more
> > than 4 hours now; by comparison, an SQL-only solution finishes in less
> > than 20 minutes).

I think people are going to have bigger problems if we promise them a
solution that only works for graphs that don't include GO.  It may compile
just fine on WeirdOS, but it certainly won't finish.

What I'm envisioning is a drop-in replacement for the pure-perl Graph.pm
that only implements Transitive_Closure_Floyd_Warshall, but in C.  You can
go grab it (and try to compile it) if you need it, but you get to start
out by using Graph.pm

Put another way, I *have* to write this thing for my own use, and will go
to the extra trouble of packaging a wrapper-class to switch between the
two options if people foresee their own needs as similar.

Your arguments about maintaining a C package are equally troublesome for
perl packages (people on foreign machines with different library versions
having problems that you need to address), so I don't really see any
difference there.

-Aaron

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey at virginia.edu




More information about the BioSQL-l mailing list