[BioRuby] [GSoC][NeXML and RDF API] RDF API
anurag08priyam at gmail.com
Thu Jul 15 14:57:45 UTC 2010
I have worked out an initial set of specs for the RDF API. The code is in
'rdf' branch - http://github.com/yeban/bioruby/tree/rdf.
I am providing an overview here:
To start with I have put the specs in bioruby/spec directory. I took the
liberty of adding a rake task to execute all the specs. Most of the specs
will fail as of now and some are pending. "rake spec SPEC_OPTS="--format
nested" " should be good to get a rough overview of the specs.
The lib itself( currently only bare class definition ) resides in
bioruby/lib/bio/rdf directory and uses Bio::RDF namespace.
At the core are Literal, Node, URI and classes, which form the subject,
predicate, object and context of any RDF statement. An RDF statement can be
created as an instance of Statement class. A collection of Statements form a
Graph. An RDF graph can be queried for statements with a given subject,
predicate or object. We can define new Vocabularies with the Vocabulary
class. I am explaining the vocabulary class in more detail below.
RDF vocabularies are defined on a namespace uri. Say, the XSD vocabulary
that defines datatypes for literals. XSD is defined on "
http://www.w3.org/2001/XMLSchema#" namespace with the 'xsd' prefix. So the
actual URI for the curie "xsd:double" goes like "
http://www.w3.org/2001/XMLSchema#double". The rational is to have such URI
and curie automatically generated :
xsd = Vocabulary.new "http://www.w3.org/2001/XMLSchema#"
I was thinking of having commonly used vocabulary defined in the lib so
someone could use it out of box like: XSD[:double] or CDAO[:foo].
The rdf lib can be used by any component of BioRuby by using that object as
the subject or object of an rdf statement. However, a cleaner solution would
be to have an Annotatable module mixed into the classes that are likely to
use the rdf lib. Annotatable would just provide a wrapper over the core rdf
lib to work with rdf. To begin with I have added two functions 'annotate'
and 'annotation' which create and return a rdf graph for that object
respectively. The example for these functions is pending in the specs.
However, I was thinking of something like:
seq = Bio::Sequece.new
seq.annotate do |graph|
graph << [self, CDAO[:foo], 'moo' ]
seq.annotation.query :predicate => CDAO[:foo]
I think with this design we can maintain loose coupling between the rdf lib
and bioruby components. I have just begun creating the classes to realize
the specs, so the design can still be modified completely if I am in a wrong
In thinking out the rdf lib, I have mostly referred to the RDF primer and
Wikipedia. I might have gone wrong on some RDF concepts too. Please correct
2nd Year Undergraduate,
Department of Mechanical Engineering,
More information about the BioRuby