[BioRuby] BioRuby/Core, BioRuby/CoolProject7, BioRuby/ThisThingInC, BioRuby/FrumpyExtensionForMyThesis

Sat Oct 13 18:59:22 UTC 2007

Re: "bioruby docs" thread - Everyone has made important, and very
valid, points. Here's how I'm seeing the sum of the matters at hand:

1) The documentation could be better

Whenever I need to use an aspect of the library I'm unfamiliar with the
only documentation I turn to is pulling apart the code and examining the
tests. This isn't how I want to do it of course, but there's several
barriers to anything better.

The RDoc could use some attention. I made this point a little while back
but didn't get around to doing much about it. (RDoc as a documentation
format is a little rough around the edges.) Here's an overview of
several ways to screw up a project's documentation with a
syntactically-poorly documented module:
 http://ninecoldwinters.com/ferro/rdoc-comment-block-examples/

The library's interface is inconsistent from module to module. I'm okay
with this since each problem a module attempts to solve would likely
feel awkward trying to share the same interface with another module.
Divergence is okay since each module has a different problem domain.

Even Ruby's biggest poster-child project, Rails, has very vocal - and
accurate - complaints regarding it's lack of documentation. When I need
to really know what's going on I have to look at it's code too, either
because the RDoc is lacking, or I'm missing a piece of the big picture.
And they have books.

The let's-make-a-wiki approach doesn't work, it's a poor substitute
for good documentation.

2) Direction

I struggle to really nail down the overall goal for BioRuby. Currently
(and feel free to add and correct) it seems to be:

  * Sequence Manipulator - Strings of text become sequence data objects
with common convenience methods.

  * File Parser - Take data files and turn them into objects to manage
the contained information.

  * Reference - Factual information provided as a convenience (molecular
weight information for instance).

  * Web Service API - There are several web-services available to use as
a data source or a data manipulator, this essentially provides methods
of convenience to interact with those.

  * Analyzer - Tools that tell you information about your data.

  * Shell - Interactive interface

So what are BioRuby's strengths and what are it's weaknesses for these?
Perhaps in several areas we can say that nothing needs any more
development - the goal has been achieved. Other objectives perhaps are
secondary and don't need to be developed further. What needs to be
added?

I very much like the idea of BioRuby being a sort of "core" library
that has one goal and does it very well (and something that we could
document exhaustively). There are many BioRuby related tools that could
be built on top of that and offered as separate modules.

I would see the Sequence Manipulator, File Parser, and Reference roles to
be all that I'm concerned about seeing in BioRuby/Core. Everything else
could be provided as extensions that have their own project maintainers
with separate gem release cycles. It should be easy for someone to add
and extend to BioRuby simply by 'require'-ing it and by adhering to
using the basic BioRuby module/class structure.

There could be dozens of RubyForge projects that utilize and add to
'bio', advertised on bioruby.org's front page, where someone could
install each of them if they wished, or just BioRuby/Core for the
basics. At present BioRuby/Core seems destined for feature bloat.
BioRuby/Annex was an excellent idea of course, I'd just like that
strategy to become how BioRuby is consistently extended in the future.
In theory for BioRuby 1.2 I could rip out several of my modules and
provide them as separate gems, but if that was the case I'd recommend
the same for several other modules that currently compose BioRuby.

3) Documentation, again

Once we know exactly what we all care about in common, then it's easy to
have some common inertia. It's "easy" to provide an extremely good,
lengthy, tutorial on, say, parsing Fasta files. But providing exhaustive
documentation / tuturials on the lesser used projects, and therefore all
of BioRuby at present, is understandably uninteresting.

The world is changing too fast for BioRuby to grow with it under the same
way we've been doing business. Science changes, Web Services get
replaced with REST, file format slop becomes RDF with real ontologies,
files become SPARQL endpoints, etc.

So I suppose the question I have is what do we want BioRuby to be? What
do we want it to excel at?