[BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC

Rutger Vos rutgeraldo at gmail.com
Thu Mar 11 10:22:04 UTC 2010


Hi Toshiaki,

great to hear there's already been a lot of discussion over this.
(Well, I'd be surprised if there hadn't been :))

It looks to me like some fairly major bookkeeping would need to be
implemented high up in the inheritance tree if *all* bioruby objects
are to be serialized into RDF. It also would require all of bioruby to
be ontologized in one fell swoop.

It is perhaps more likely that subdomains are going to be ontologized
more or less independently from one another (as you mention,
references->RDF, or in my case phylogenetics->RDF) based implicitly on
intermediate data formats (pubmed records and nexml, respectively).

That is probably OK, we do things as needs arise.

But what would be handy if the API was at least general enough so that
this was extensible and we can make additional statements *about*
objects when we serialize them to RDF. For example, in your pubmed
turtle file, the subject is always
<http://togows.dbcls.jp/entry/ncbi-pubmed/16381885>. Is there a way,
programmatically, where I can add additional statements about
<http://togows.dbcls.jp/entry/ncbi-pubmed/16381885>?

Rutger

On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama <ktym at hgc.jp> wrote:
> Hi Rutger,
>
> Thank you for your inputs on GSoC 2010!
>
>> * is there a way to express triples in BioRuby?
>> * if there is not, what would be a good design to express triples in
>> BioRuby so that this would be more useful than just for NeXML?
>
> This is what we discussed during the pre-BioHackathon 2010.
>
> http://hackathon3.dbcls.jp/wiki/BioRuby
>
> My first idea was to make all BioRuby object have common output
> method to render the object contents in various formats
> (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate).
>
> Then, we tried to separate view from logic using erb, but as you
> see in the above page, it still looks ugly. It is mainly because
> view formatting itself requires some additional codes, specific
> to each format.
>
> Therefore, we don't have a solid conclusion on this yet, unfortunately.
>
> Anyway, we already have PubMed to RDF converter written in Ruby as
> the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at
>
> http://togows.dbcls.jp/entry/pubmed/16381885
> --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl
>
> and, we are also trying to support KEGG to RDF conversion in this
> framework as well. I think we can put the code in BioRuby when we finished.
>
> Your suggestions are welcome. :)
>
> Regards,
> Toshiaki
>
> On 2010/03/10, at 22:22, Rutger Vos wrote:
>
>> Dear BioRuby-ites,
>>
>> my apologies that my first email to this list is so long and
>> tangential. I am trying to find out how to express RDF triples in
>> BioRuby. In this email I'm explaining why I care enough to try to get
>> funding for someone to work on this. If you don't care about any of
>> this, you can stop reading now.
>>
>> The National Evolutionary Synthesis Center (NESCent.org) is planning
>> to be a mentoring organization for the Google Summer of Code 2010. I
>> have submitted a project idea to this: to develop NeXML I/O and -
>> probably more importantly for you - RDF capabilities for BioRuby. If
>> funded, a student/coder will work on this full time over the summer,
>> under the shared supervision of Jan Aerts and myself. Here is the
>> link: http://tinyurl.com/biorubynexml
>>
>> NeXML is a data format for phylogenetic data that can be read and
>> written in perl, python, java and (to some extent) c++ and javascript.
>> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I
>> can tell BioRuby isn't completely up to speed for it, yet.
>>
>> (As an aside: you might ask yourself why there is something like NeXML
>> when there is PhyloXML for BioRuby. The answer is that NeXML solves a
>> different problem: PhyloXML started essentially as a next generation
>> of New Hampshire eXtended (NHX) to meet the annotation needs of
>> comparative genomics, things such as gene duplications and other
>> molecular evolution events, on phylogenetic trees; NeXML started as a
>> complete XML representation of the NEXUS format, providing other
>> comparative data types such as categorical and continuous character
>> state matrices, restriction site matrices, and so on, in addition to
>> trees, taxa, sequence alignments. There is obviously some overlap
>> between the formats, but I guess that is not unique in bioinformatics
>> :))
>>
>> NeXML has a semantic annotation facility that uses RDFa. This allows
>> us to add additional metadata to a fundamental phylogenetic data
>> object (a tree, taxon, character, etc.) to form a "triple": the
>> fundamental data object is the triple Subject, and the Predicate and
>> Object are added as RDFa attributes. Since NeXML can be transformed
>> using a standard XSL stylesheet to RDF/XML, we can express a limitless
>> number of statements about phylogenetics. However, this means that any
>> NeXML I/O library needs to be able to represent RDF triples. I have
>> studied the BioRuby API as best as I could (but: I don't know ruby)
>> and couldn't identify how to do this.
>>
>> My questions to you:
>>
>> * is there a way to express triples in BioRuby?
>> * if there is not, what would be a good design to express triples in
>> BioRuby so that this would be more useful than just for NeXML?
>>
>> Thank you!
>>
>> Rutger
>>
>> --
>> Dr. Rutger A. Vos
>> School of Biological Sciences
>> Philip Lyle Building, Level 4
>> University of Reading
>> Reading
>> RG6 6BX
>> United Kingdom
>> Tel: +44 (0) 118 378 7535
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com



More information about the BioRuby mailing list