[Bioperl-l] OntologyTermI
Hilmar Lapp
hlapp@gnf.org
Wed, 28 Aug 2002 09:53:55 -0700
(Sorry if you get this twice. Somewhere in the chain of smtp exchanges, this disappeared.)
Hi all,
we're going to need an Ontology interface and parsers for different
formats pretty soon as we want to bring GO and other ontologies into
Biosql. Ewan even put Ontology support on the road map for 1.2, so
it may the right time to join forces here.
Our preliminary picture here so far is that we are going to need a
basic interface describing an ontology entry conceptually, which is
then realized by different implementations. To give it a name, say
Bio::OntologyTermI, with implementations living in Bio::Ontology:
Bio::Ontology::OntologyTerm # base implementation,
# is-a Bio::OntologyTermI
Bio::Ontology::GOTerm # is-a Bio::Ontology::OntologyTerm
... etc
We are looking at InterPro as in fact being another ontology, so in
this scheme there would also be
Bio::Ontology::InterPro
Now this sketchy picture doesn't pay a lot of attention to
ontologies being graphs, and looks at them from the use-case point
of view rather than the computer science abstraction view point.
The GO perl API in GO::Model::* in contrast lays out and implements
the graph model. (Cool!)
Does the simple sketch above make any sense? Is it going to be
useful and appropriate? Would copying all methods from
GO::Model::Term into OntologyTermI provide for a good start?
To me it seems porting over GO::Model to Bioperl should be a pretty
straight-forward process. Or should we prefer not to port it over
but instead keep an external dependency to the GO perl API?
We'll also need a streaming IO. Again, the GO parser already exists
(for the XML version of the dump too?). Peter on our end is going to
add one for InterPro unless someone can point us at something we can
steal for that purpose (which would be great). The interface I'd
suggest should resemble the other streaming interfaces in Bioperl,
e.g.
package Bio::OntologyIO.pm
# returns a Bio::OntologyTermI object
sub next_term {
}
# serializes one or more Bio::OntologyI objects
sub write_term {
}
and drivers in Bio::OntologyIO::*.
Again, does this make any sense? I'm unsure how compatible the input
being a graph is with a streaming next_XXX() kind of thing. I'm also
wondering how a streaming interface can be plugged into the current
GO::Parser/GO::Builder framework, without reading the entire file
up-front. Would flatfile parsing need the XS extension in C as
stated in GO::AppHandle? Any advice from the experts much
appreciated ...
Ideally the groundwork for this can be steered by someone else than
us, as we are clearly only beginners in this field. Chris? We'll
just need something working pretty soon ...
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------