[BioRuby] TogoWS (Re: BioRuby on github + lighthouse)

Toshiaki Katayama ktym at hgc.jp
Thu Jun 12 20:32:46 UTC 2008


Dear all,

Sorry for my long absence after the BioHackathon held in this February.
However, I'm afraid that I can't spare enough time for organizing
your request for a while yet.

Instead, I need to wrap up outputs from the BioHackathon first.

BioRuby team had focused on the generalized sequence model and
by completing the work I can provide pretty nice (hopefully) feature
-- parsing any sequence database entry with REST-like web service API.

I hope all of you like the following idea and help me to finish the task
by integrating GenBank, EMBL, UniProt, BioSQL with the new Bio::Sequence model
as we had discussed during the Hackathon.

Sample implementation (TogoWS) is now available at

  http://togows.dbcls.jp/site/rest.html

where you can find links to retrieve database entries with Rails like "Pretty URL"
(sorry for the Japanese text, I'll provide English version some time).

For example, plain GenBank entry HUMIGHAF is available at

  http://togows.dbcls.jp/entry/genbank/HUMIGHAF

and you can obtain

* XML version by

  http://togows.dbcls.jp/entry/genbank/HUMIGHAF.xml

* FASTA version by

  http://togows.dbcls.jp/entry/genbank/HUMIGHAF.fasta

and, as this service is built on top of the BioRuby library,
you can also parse the entry to obtain a specific field by calling
any bioruby method in the Bio::GenBank class with slash.

* DEFINITION field

  http://togows.dbcls.jp/entry/genbank/HUMIGHAF/definition

However, methods to fetch specific field varies database to database,
because of the different implementations in the corresponding classes.

Fortunately, It would be pretty easy to solve this situation.
We just need to convert GenBank, EMBL, UniProt and BioSQL data model
to the generic Bio::Sequence class and use the method in the generic class.
And, this is the same story that we had agreed during the Hackathon.

Along with this, we need to define a set of generic methods to access the
internal structure and also need to define a set of standard output formats
(for features, references, cross refs, dates etc.) - slightly tough part.

For example, it would be great if I can extract feature table
in a reusable standard format like GFF (or [protein] DAS)
instead of a YAML/XML dump of the array of Bio::Feature class.

(followings are not yet implemented but should return the same result).

  http://togows.dbcls.jp/entry/genbank/J00231.gff
  http://togows.dbcls.jp/entry/genbank/J00231/features
  http://togows.dbcls.jp/entry/embl/J00231.gff
  http://togows.dbcls.jp/entry/embl/J00231/features
   :

All we need is to list up method names and return values (formats)
commonly usable with any sequence database entries.

Pj, you may also want to have something like

  http://togows.dbcls.jp/entry/pubmed/16381885.bibtex
  http://togows.dbcls.jp/entry/pubmed/16381885.endnote
  http://togows.dbcls.jp/entry/pubmed/16381885/url

and these are trivial to implement, just add the appropriate methods
in the Bio::Reference class. For this purpose, I don't hesitate to
change internal logic/APIs as you made, as long as it is reasonable.

I'm also planning to provide search interface and converters in a similar way.
Converters will include BLAST output to GFF (maybe by using BioPerl :) etc.

The outcomes of the BioHackathon 2008 was fairly diverse, but I think
this approach is one direction to evolve the basic infrastructure
of the bioinformatics resources towards the useful integration.

Actually, the real problem is, I'm still busy with other tasks and
can't spare 100% effort on these...

Regards,
Toshiaki Katayama

On 2008/06/12, at 18:02, Pjotr Prins wrote:

> Hi All,
>
> In view of my recent bibtex commit fiasco - which I thought an
> improvement, but probably was a regression as N. pointed out and
> rolled back - I favour moving the sources to a non-centralized
> repository. This will allow individual development where the main
> maintainers can cherry-pick individual patches for inclusion in the
> stable and development trees.
>
> Toshiaki, both Jan and I want to ask you to check out this technology
> and take the lead by moving a 'blessed' branch into github. The
> alternative is that I do the same thing - both cases will allow you
> to continue as before, but some development will be on git branches.
>
> Technology does not solve problems - like the problem of lack general
> action in the source tree - but at least git will allow people to have
> a sense of freedom. And it is up to the central maintainers what to
> include and what not. Much like the role Linus plays in kernel
> development.
>
> As ever, with respect,
>
> Pj.
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby





More information about the BioRuby mailing list