[Bioperl-l] Splits again
Nathan S. Haigh
n.haigh at sheffield.ac.uk
Thu Jun 28 08:27:54 UTC 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Sendu Bala wrote:
> Chris Fields wrote:
>> On Jun 27, 2007, at 5:43 PM, Sendu Bala wrote:
>>> What advantage is there of these defined splits instead of
>>> individual modules? As I see it you lose some of the potential
>>> benefits of breaking Bioperl up completely, whilst also suffering
>>> the maintenance problems I outlined in my objection to Steve's post.
>>>
>>> Being able to work on all Bioperl from a single cvs (ne svn) check
>>> out/ archive, whilst distributing it as individual modules on CPAN
>>> seems like the best of both worlds to me. What am I missing?
>>
>> Okay, forewarned, but here's my long-winded reasoning. The short and
>> sweet version: I (very) respectfully don't agree with you, at least
>> re: the idea we should commit all modules to CPAN independently. It
>> doesn't make any sense to me, but maybe you can elaborate more?
>> Maybe I'm misinterpreting what you mean?
>
> The short and sweet version: my proposal has all the benefits of yours,
> but none of the disadvantages. What's not to like?
>
>
>> Finally, all of this should wait until later. Much later, like after
>> a decent release, after svn, etc kind of 'later'. I think we can
>> agree on that.
>
> Hmm, not really. If it can be implemented by a change in just Build.PL
> and ModuleBuildBioperl, its really independent of everything else.
> That's the beauty of it: the only thing that changes is how things are
> uploaded to and downloaded from CPAN. The only person that normally
> deals with that issue is the pumpkin for a release, and he only cares
> about it at release time.
>
> In fact, if we're going to do it at all it makes sense to try it out on
> a minor release like 1.5.3. We've already got experience of doing it
> split-style from 1.5.2. (And let me tell you: splits at the code-base
> level suck.)
>
>
>> Individual CPAN modules:
>>
>> CPAN is not our personal versioning system; it may be if a
>> distribution consists of only a few modules, but not when it's one of
>> the largest distros present. If someone wants to update an
>> individual bioperl module for a quick bug fix they are more than
>> welcome to download it via cvs, svn, or even using a web browser, and
>> replace the one they have.
>
> And where is the harm in letting them do it via CPAN as well? In fact,
> there are significant benefits:
>
>
>> I'm trying to reason how one could break up the individual SeqIO/
>> SearchIO/otherIO modules into single module distributions. They are
>> intrinsically tied together (SeqIO::genbank won't work w/o SeqIO,
>> which relies on the various interfaces, RootIO, and on down). How
>> would tests be run off CPAN when the modules are distributed
>> independently?
>
> Bio::SeqIO::genbank would have a dependency on the latest version of
> Bio::SeqIO (etc.), and Bio::SeqIO would have its own dependencies.
>
> So when a user wants to get the latest version of Bio::SeqIO::genbank,
> they no longer have to worry about what other modules in its dependency
> hierarchy they should also install.
>
> Instead they just request Bio::SeqIO::genbank which itself ensures you
> have the latest version of all its dependencies before installing itself
> and running its tests.
>
> When a dev makes a major bugfix to Bio::SeqIO::genbank that all genbank
> users should have, he could just call './Build dist Bio::SeqIO::genbank'
> which would generate a new package for Bio::SeqIO::genbank suitable for
> uploading to CPAN. No more long release cycles and having to constantly
> tell people to 'use CVS' to get working Bioperl code.
>
>
>> Would they also be individually distributed? What would you use to
>> tie all the individual modules together? How would you explain to
>> the CPAN maintainers that you want to split bioperl into 990
>> individual modules, all updated independently, but intend on bundling
>> them afterwards anyway?
>
> They would be tied together by a CPAN bundle. You don't have to
> 'explain' anything to the CPAN maintainers because you're not doing
> anything wrong. In fact, you're using it the way you're supposed to.
>
The successor to Bundles - may prove interesting:
http://search.cpan.org/~adamk/Task-1.01/lib/Task.pm
>
>> Splitting up core:
>>
>> As I see it, here are the advantages of a defined split as Steve and
>> I see it (off the top of my head). Some of this probably reiterates
>> my previous points, as well as Steve's, so apologies in advance.
>
> Below I answer with how it would be with my single-module approach
> compared to the defined splits.
>
>
>> - A lean, mean, focused set of bioperl base modules (core) w/o or
>> with very few external deps, minimal installation issues, etc. The
>> very basic stuff to get up and running.
>
> Even leaner, even more focused.
>
>
>> - BioPerl bundled modules (Nathan's 'cliques') with defined, focused
>> functionality, code, and tests, which add a bit more 'sugar' to the
>> base functionality of the core. If you only care about parsing BLAST
>> reports, get SearchIO, which requires core and optionally other
>> modules (XML::SAX). If you want additional DB functionality apart
>> from the very basic ones in core, install DB (with it's additional
>> requirements, including core, DBI, and so on). Same with Graphics,
>> Tools, Tree/Phylo, etc. We just need to define and limit the number
>> of splits.
>
> The same can be achieved with CPAN bundles for each kind of functional
> grouping you can think of. And since its just a single text file that
> defines such a grouping, its easy to change or add new ones as you feel
> like it, as opposed to the rather more permanent and substantial effort
> of creating one of your splits on the code-base level.
>
> Also, the world doesn't have to rely on /our/ ideas of what a useful
> functional split is. If someone just wants to parse Blast results, they
> can just use CPAN to install Bio::SearchIO::blast_pull instead of having
> to install all of SearchIO.
>
>
>> - Easier to add additional bundled modules. For instance, I could
>> focus all of my RNA work into a discrete set of modules (say, bioperl-
>> rna) which I maintain, I ensure works with the latest core code, I
>> ensure also plays well with the other children =) , and I distribute
>> via CPAN. Same with EUtilities, which could go into a separated DB-
>> related set or stay in core.
>
> And if you lose interest in them? They eventually die because they no
> longer have someone looking after them by default (the pumpkin and other
> devs). Alternatively you could just make a CPAN bundle. One text file!
> Easy! No duplication of modules in CPAN, no new hassle for you or the
> Bioperl 'core' pumpkin to ensure that the latest version of each work
> with each other and other splits.
>
>
>> - If we want a full-fledged 'install everything', the CPAN Bundle
>> system is available. I think it's easier to use a Bundle for 4-5,
>> even 10 groups of modules as opposed to over 900.
>
> No, it isn't any easier. Its /equally/ easy to install a bundle of 900
> packages of 900 modules as it is to install 5 packages of 900 modules.
>
> When not installing absolutely everything, but perhaps 'most' things,
> there's the additional benefit that it would be easier to skip a
> particular Bio::module because you didn't want to install its external
> dependencies and weren't that interested in it anyway.
>
>
>> - A Bundle or a build file where discrete distributions are listed
>> (Bio::SearchIO, etc) wouldn't need to be updated every time a new
>> module is added to a distribution. I suppose this could be
>> automated, but why have the additional headache?
>
> Yes, it would be automated, and no, it wouldn't at all be any kind of
> additional headache. I'm proposing a fully-automated system that the
> pumpkin wouldn't even have to think about it. Much /less/ of a headache
> than dealing with splits. Orders of magnitude easier to deal with.
>
>
>> - A chance to cut out some cruft. We all know that particular areas
>> need work or a complete overhaul (Restriction, Structure, maybe a few
>> others). Smaller, concentrated sets of modules I believe would be
>> easier to maintain, and those that don't get use will eventually fall
>> out of favor and may be lost or replaced from the more maintained
>> group of modules. Survival of the fittest.
>
> And the smallest, most concentrated set of modules is the individual
> module.
>
>
>> - We already have had practice; bioperl-db, bioperl-run, bioperl-
>> network, and others. Those that have been routinely maintained and
>> enjoy wide use (db, run, network) have survived; others not so much
>> (corba-related stuff, microarray, ext, etc., though the code is still
>> available if someone else wants to take it up and revive it!).
>
> The reason some of these existing splits (micoarray, ext) have fallen by
> the way-side? /Because/ they're splits. If they had been part of
> bioperl-live all along, they'd have been kept in a working, compatible
> state and would have been released along with everything else in 1.5.2
>
>
>> Disadvantages of a defined split:
>>
>> - The initial headache of identifying which groups go where,
>> coordinating with those who rely on bioperl (GMOD, etc) on how this
>> will be set up, so on...
>
> No need to worry about this with individual modules.
>
>
>> - Separate groups of modules require testing together to ensure
>> functionality is consistent and maintained (something I think you
>> pointed out previously).
>
> No need to worry.
>
>
>> - I think an increased possibility of branching is possible.
>>
>> - Extra headaches for devs, who have to keep track of the various
>> critical distributions and make sure they work well together.
>
> No headaches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGg3EKczuW2jkwy2gRAriiAJ47Qz9jTshEXuaG0XMYrUTI0hHqAwCeL45r
r/BykCKbM9lqJM0khARuEms=
=NB4B
-----END PGP SIGNATURE-----
More information about the Bioperl-l
mailing list