[BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC

Rutger Vos rutgeraldo at gmail.com
Wed Mar 10 13:22:48 UTC 2010


Dear BioRuby-ites,

my apologies that my first email to this list is so long and
tangential. I am trying to find out how to express RDF triples in
BioRuby. In this email I'm explaining why I care enough to try to get
funding for someone to work on this. If you don't care about any of
this, you can stop reading now.

The National Evolutionary Synthesis Center (NESCent.org) is planning
to be a mentoring organization for the Google Summer of Code 2010. I
have submitted a project idea to this: to develop NeXML I/O and -
probably more importantly for you - RDF capabilities for BioRuby. If
funded, a student/coder will work on this full time over the summer,
under the shared supervision of Jan Aerts and myself. Here is the
link: http://tinyurl.com/biorubynexml

NeXML is a data format for phylogenetic data that can be read and
written in perl, python, java and (to some extent) c++ and javascript.
RDF is the cool "new" thing (as per BioHackathon2010), but as far as I
can tell BioRuby isn't completely up to speed for it, yet.

(As an aside: you might ask yourself why there is something like NeXML
when there is PhyloXML for BioRuby. The answer is that NeXML solves a
different problem: PhyloXML started essentially as a next generation
of New Hampshire eXtended (NHX) to meet the annotation needs of
comparative genomics, things such as gene duplications and other
molecular evolution events, on phylogenetic trees; NeXML started as a
complete XML representation of the NEXUS format, providing other
comparative data types such as categorical and continuous character
state matrices, restriction site matrices, and so on, in addition to
trees, taxa, sequence alignments. There is obviously some overlap
between the formats, but I guess that is not unique in bioinformatics
:))

NeXML has a semantic annotation facility that uses RDFa. This allows
us to add additional metadata to a fundamental phylogenetic data
object (a tree, taxon, character, etc.) to form a "triple": the
fundamental data object is the triple Subject, and the Predicate and
Object are added as RDFa attributes. Since NeXML can be transformed
using a standard XSL stylesheet to RDF/XML, we can express a limitless
number of statements about phylogenetics. However, this means that any
NeXML I/O library needs to be able to represent RDF triples. I have
studied the BioRuby API as best as I could (but: I don't know ruby)
and couldn't identify how to do this.

My questions to you:

* is there a way to express triples in BioRuby?
* if there is not, what would be a good design to express triples in
BioRuby so that this would be more useful than just for NeXML?

Thank you!

Rutger

-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com



More information about the BioRuby mailing list