[BioRuby] RDF Triples in BioRuby, a funding proposal to Google SoC

Rutger Vos rutgeraldo at gmail.com
Mon Apr 12 11:25:00 UTC 2010


Hi all,

here's a brief followup: we have received three student applications
for this GSoC project. All three look fairly strong. Hopefully we will
get funding!

Rutger

On Mon, Mar 15, 2010 at 1:27 PM, Rutger Vos <rutgeraldo at gmail.com> wrote:
> To follow up along more practical lines, I've had to deal with similar
> design issues in Bio::Phylo (perl), TreeBASE and Mesquite (both java).
> I've learned it makes sense to have:
>
> - a simple "annotation" object, with getters and setters for the
> predicate namespace uri, the predicate string, and the value object
> (either a literal or a uri),
>
> - a get_annotations method for all (fundamental) data objects in the
> toolkit that returns a collection of these annotation object
>
> this way, when you serialize any bioruby object into rdf, you can add
> as many other statements about that object as you want.
>
> Would a refactoring along those lines have a chance of being
> acceptable to the bioruby community (of course subsequent to a more
> detailed RFC, testing, discussion, proof of concept, etc.)?
>
> On Thursday, March 11, 2010, Rutger Vos <rutgeraldo at gmail.com> wrote:
>> Hi Toshiaki,
>>
>> great to hear there's already been a lot of discussion over this.
>> (Well, I'd be surprised if there hadn't been :))
>>
>> It looks to me like some fairly major bookkeeping would need to be
>> implemented high up in the inheritance tree if *all* bioruby objects
>> are to be serialized into RDF. It also would require all of bioruby to
>> be ontologized in one fell swoop.
>>
>> It is perhaps more likely that subdomains are going to be ontologized
>> more or less independently from one another (as you mention,
>> references->RDF, or in my case phylogenetics->RDF) based implicitly on
>> intermediate data formats (pubmed records and nexml, respectively).
>>
>> That is probably OK, we do things as needs arise.
>>
>> But what would be handy if the API was at least general enough so that
>> this was extensible and we can make additional statements *about*
>> objects when we serialize them to RDF. For example, in your pubmed
>> turtle file, the subject is always
>> <http://togows.dbcls.jp/entry/ncbi-pubmed/16381885>. Is there a way,
>> programmatically, where I can add additional statements about
>> <http://togows.dbcls.jp/entry/ncbi-pubmed/16381885>?
>>
>> Rutger
>>
>> On Wed, Mar 10, 2010 at 2:21 PM, Toshiaki Katayama <ktym at hgc.jp> wrote:
>>> Hi Rutger,
>>>
>>> Thank you for your inputs on GSoC 2010!
>>>
>>>> * is there a way to express triples in BioRuby?
>>>> * if there is not, what would be a good design to express triples in
>>>> BioRuby so that this would be more useful than just for NeXML?
>>>
>>> This is what we discussed during the pre-BioHackathon 2010.
>>>
>>> http://hackathon3.dbcls.jp/wiki/BioRuby
>>>
>>> My first idea was to make all BioRuby object have common output
>>> method to render the object contents in various formats
>>> (such as RDF/XML, Turtle, HTML, GFF, FASTA etc. if appropriate).
>>>
>>> Then, we tried to separate view from logic using erb, but as you
>>> see in the above page, it still looks ugly. It is mainly because
>>> view formatting itself requires some additional codes, specific
>>> to each format.
>>>
>>> Therefore, we don't have a solid conclusion on this yet, unfortunately.
>>>
>>> Anyway, we already have PubMed to RDF converter written in Ruby as
>>> the TogoWS REST API (http://togows.dbcls.jp/site/en/rest.html) at
>>>
>>> http://togows.dbcls.jp/entry/pubmed/16381885
>>> --> http://togows.dbcls.jp/entry/pubmed/16381885.ttl
>>>
>>> and, we are also trying to support KEGG to RDF conversion in this
>>> framework as well. I think we can put the code in BioRuby when we finished.
>>>
>>> Your suggestions are welcome. :)
>>>
>>> Regards,
>>> Toshiaki
>>>
>>> On 2010/03/10, at 22:22, Rutger Vos wrote:
>>>
>>>> Dear BioRuby-ites,
>>>>
>>>> my apologies that my first email to this list is so long and
>>>> tangential. I am trying to find out how to express RDF triples in
>>>> BioRuby. In this email I'm explaining why I care enough to try to get
>>>> funding for someone to work on this. If you don't care about any of
>>>> this, you can stop reading now.
>>>>
>>>> The National Evolutionary Synthesis Center (NESCent.org) is planning
>>>> to be a mentoring organization for the Google Summer of Code 2010. I
>>>> have submitted a project idea to this: to develop NeXML I/O and -
>>>> probably more importantly for you - RDF capabilities for BioRuby. If
>>>> funded, a student/coder will work on this full time over the summer,
>>>> under the shared supervision of Jan Aerts and myself. Here is the
>>>> link: http://tinyurl.com/biorubynexml
>>>>
>>>> NeXML is a data format for phylogenetic data that can be read and
>>>> written in perl, python, java and (to some extent) c++ and javascript.
>>>> RDF is the cool "new" thing (as per BioHackathon2010), but as far as I
>>>> can tell BioRuby isn't completely up to speed for it, yet.
>>>>
>>>> (As an aside: you might ask yourself why there is something like NeXML
>>>> when there is PhyloXML for BioRuby. The answer is that NeXML solves a
>>>> different problem: PhyloXML started essentially as a next generation
>>>> of New Hampshire eXtended (NHX) to meet the annotation needs of
>>>> comparative genomics, things such as gene duplications and other
>>>> molecular evolution events, on phylogenetic trees; NeXML started as a
>>>> complete XML representation of the NEXUS format, providing other
>>>> comparative data types such as categorical and continuous character
>>>> state matrices, restriction site matrices, and so on, in addition to
>>>> trees, taxa, sequence alignments. There is obviously some overlap
>>>> between the formats, but I guess that is not unique in bioinformatics
>>>> :))
>>>>
>>>> NeXML has a semantic annotation facility that uses RDFa. This allows
>>>> us to add additional metadata to a fundamental phylogenetic data
>>>> object (a tree, taxon, character, etc.) to form a "triple": the
>>>> fundamental data object is the triple Subject, and the Predicate and
>>>> Object are added as RDFa attributes. Since NeXML can be transformed
>>>> using a standard XSL stylesheet to RDF/XML, we can express a limitless
>>>> number of statements about phylogenetics. H
>
> --
> Dr. Rutger A. Vos
> School of Biological Sciences
> Philip Lyle Building, Level 4
> University of Reading
> Reading
> RG6 6BX
> United Kingdom
> Tel: +44 (0) 118 378 7535
> http://www.nexml.org
> http://rutgervos.blogspot.com
>



-- 
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com



More information about the BioRuby mailing list