[Bioperl-l] Bioperl partitioning (was Re: SVN and ...Re: Perltidy)

Tue Jun 19 07:41:57 UTC 2007

Steve Chervitz wrote:
> Might this been a good opportunity to investigate partitioning
> bioperl-live into sub-repositories? There has been talk in the past of
> defining a set of "core" modules separate from other functionally
> related groups of modules that would be viewed as optional extensions.
> The goal being to help manage growth and simplify releases. There are
> currently 892 modules under Bio/.
> 
> In addition to simplifying the migration to SVN, it would also have
> other benefits. Say some new functionality or a slew of fixes were
> added to Bio::Graphics. We could turn around a new Bio::Graphics
> release quickly without having to work on getting various other parts
> up to snuff that aren't related to graphics (Biblio, DB, PopGen,
> Search etc.). Maintenance and releases of the various extensions would
> be more parallelizable, orchestrated by separate ring leaders.
> 
> Over time, as a set of functionality matures, it would see fewer
> updates and there would be less of a need for users to
> download/install/test it. This could make bioperl easier to customize,
> extend, and grok in general.
> 
> Long term, it should ease development and release cycles

I actually take the opposite view. Breaking things up makes testing and 
releases more difficult.

If one person acts as pumpkin for all the sub-parts, his work-load 
increases almost linearly with the number of sub-parts. If each sub-part 
gets its own pumpkin, where do all these pumpkins come from? It seems to 
me that frequently authors will write modules but inevitably their 
circumstance changes and they can no longer devote the time to look 
after them. Having a single pumpkin and 'forcing' him to make sure 
everything works (regardless of his personal interest in the module) 
seems more reliable than hoping there will be a person interested enough 
in each sub-part to handle its release.

Since all sub-parts will at the least interact with the 'true' core set 
of Bioperl modules, they need to be tested and potentially re-released 
every time the true core is updated. And since some sub-parts will 
interact with other sub-parts, there will need to be coordinated 
joint-testing and release of multiple sub-parts.

What happens when users report problems? We ask them what version 
they're running. Right now '1.5.2' means a specific thing, and its 
trivial for someone to confirm the same problem by installing 1.5.2. 
What happens when users have to list out all the versions of all the 
sub-parts they have? Who is going to consistently recreate a users 
hodge-podge of versions in order to confirm a bug? Won't the advice 
instead be: "update all versions to the latest and get back to us"?

So, as I see it, all sub-parts would best be tested and released with a 
single new version number every time one sub-part is updated 
(significantly). In which case, why have sub-parts at all? Keeping 
things the way they are now means ease of release for the pumpkin and 
ease of installation for end-users (only one install command to issue to 
CPAN). Having 'true' sub-parts (each with its own pumpkin), in my 
fatalistic view, is just going to lead to some useful sub-parts being 
abandoned and never updated, even where updates may be desirable.

Each and every Bio:: module could have been released separately by its 
respective author. As I see it, one of the main values of 'Bioperl' is 
that its one (reasonably) consistent collection of modules that lowers 
the barrier of entry for new Bioinformaticians, giving them extremely 
easy access to a whole host of functionality with a single install.