[Bioperl-l] Splits again

Thu Jun 28 08:27:54 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sendu Bala wrote:
> Chris Fields wrote:
>> On Jun 27, 2007, at 5:43 PM, Sendu Bala wrote:
>>> What advantage is there of these defined splits instead of 
>>> individual modules? As I see it you lose some of the potential 
>>> benefits of breaking Bioperl up completely, whilst also suffering 
>>> the maintenance problems I outlined in my objection to Steve's post.
>>>
>>> Being able to work on all Bioperl from a single cvs (ne svn) check 
>>> out/ archive, whilst distributing it as individual modules on CPAN 
>>> seems like the best of both worlds to me. What am I missing?
>>
>> Okay, forewarned, but here's my long-winded reasoning.  The short and 
>> sweet version: I (very) respectfully don't agree with you, at least 
>> re: the idea we should commit all modules to CPAN independently. It 
>> doesn't make any sense to me, but maybe you can elaborate more?  
>> Maybe I'm misinterpreting what you mean?
> 
> The short and sweet version: my proposal has all the benefits of yours,
> but none of the disadvantages. What's not to like?
> 
> 
>> Finally, all of this should wait until later.  Much later, like after 
>> a decent release, after svn, etc kind of 'later'.  I think we can 
>> agree on that.
> 
> Hmm, not really. If it can be implemented by a change in just Build.PL
> and ModuleBuildBioperl, its really independent of everything else.
> That's the beauty of it: the only thing that changes is how things are
> uploaded to and downloaded from CPAN. The only person that normally
> deals with that issue is the pumpkin for a release, and he only cares
> about it at release time.
> 
> In fact, if we're going to do it at all it makes sense to try it out on
> a minor release like 1.5.3. We've already got experience of doing it
> split-style from 1.5.2. (And let me tell you: splits at the code-base
> level suck.)
> 
> 
>> Individual CPAN modules:
>>
>> CPAN is not our personal versioning system; it may be if a 
>> distribution consists of only a few modules, but not when it's one of 
>> the largest distros present.  If someone wants to update an 
>> individual bioperl module for a quick bug fix they are more than 
>> welcome to download it via cvs, svn, or even using a web browser, and 
>> replace the one they have.
> 
> And where is the harm in letting them do it via CPAN as well? In fact,
> there are significant benefits:
> 
> 
>> I'm trying to reason how one could break up the individual SeqIO/
>> SearchIO/otherIO modules into single module distributions.  They are 
>> intrinsically tied together (SeqIO::genbank won't work w/o SeqIO, 
>> which relies on the various interfaces, RootIO, and on down).  How 
>> would tests be run off CPAN when the modules are distributed 
>> independently?
> 
> Bio::SeqIO::genbank would have a dependency on the latest version of
> Bio::SeqIO (etc.), and Bio::SeqIO would have its own dependencies.
> 
> So when a user wants to get the latest version of Bio::SeqIO::genbank,
> they no longer have to worry about what other modules in its dependency
> hierarchy they should also install.
> 
> Instead they just request Bio::SeqIO::genbank which itself ensures you
> have the latest version of all its dependencies before installing itself
> and running its tests.
> 
> When a dev makes a major bugfix to Bio::SeqIO::genbank that all genbank
> users should have, he could just call './Build dist Bio::SeqIO::genbank'
> which would generate a new package for Bio::SeqIO::genbank suitable for
> uploading to CPAN. No more long release cycles and having to constantly
> tell people to 'use CVS' to get working Bioperl code.
> 
> 
>> Would they also be individually distributed?  What  would you use to
>> tie all the individual modules together?  How would  you explain to
>> the CPAN maintainers that you want to split bioperl  into 990
>> individual modules, all updated independently, but intend on  bundling
>> them afterwards anyway?
> 
> They would be tied together by a CPAN bundle. You don't have to
> 'explain' anything to the CPAN maintainers because you're not doing
> anything wrong. In fact, you're using it the way you're supposed to.
> 


The successor to Bundles - may prove interesting:
http://search.cpan.org/~adamk/Task-1.01/lib/Task.pm


> 
>> Splitting up core:
>>
>> As I see it, here are the advantages of a defined split as Steve and 
>> I see it (off the top of my head).  Some of this probably reiterates 
>> my previous points, as well as Steve's, so apologies in advance.
> 
> Below I answer with how it would be with my single-module approach
> compared to the defined splits.
> 
> 
>> - A lean, mean, focused set of bioperl base modules (core) w/o or 
>> with very few external deps, minimal installation issues, etc.  The 
>> very basic stuff to get up and running.
> 
> Even leaner, even more focused.
> 
> 
>> - BioPerl bundled modules (Nathan's 'cliques') with defined, focused 
>> functionality, code, and tests, which add a bit more 'sugar' to the 
>> base functionality of the core.  If you only care about parsing BLAST 
>> reports, get SearchIO, which requires core and optionally other 
>> modules (XML::SAX).  If you want additional DB functionality apart 
>> from the very basic ones in core, install DB (with it's additional 
>> requirements, including core, DBI, and so on).  Same with Graphics, 
>> Tools, Tree/Phylo, etc.  We just need to define and limit the number 
>> of splits.
> 
> The same can be achieved with CPAN bundles for each kind of functional
> grouping you can think of. And since its just a single text file that
> defines such a grouping, its easy to change or add new ones as you feel
> like it, as opposed to the rather more permanent and substantial effort
> of creating one of your splits on the code-base level.
> 
> Also, the world doesn't have to rely on /our/ ideas of what a useful
> functional split is. If someone just wants to parse Blast results, they
> can just use CPAN to install Bio::SearchIO::blast_pull instead of having
> to install all of SearchIO.
> 
> 
>> - Easier to add additional bundled modules.  For instance, I could 
>> focus all of my RNA work into a discrete set of modules (say, bioperl-
>> rna) which I maintain, I ensure works with the latest core code, I 
>> ensure also plays well with the other children =) , and I distribute 
>> via CPAN.  Same with EUtilities, which could go into a separated DB-
>> related set or stay in core.
> 
> And if you lose interest in them? They eventually die because they no
> longer have someone looking after them by default (the pumpkin and other
> devs). Alternatively you could just make a CPAN bundle. One text file!
> Easy! No duplication of modules in CPAN, no new hassle for you or the
> Bioperl 'core' pumpkin to ensure that the latest version of each work
> with each other and other splits.
> 
> 
>> - If we want a full-fledged 'install everything', the CPAN Bundle 
>> system is available.  I think it's easier to use a Bundle for 4-5, 
>> even 10 groups of modules as opposed to over 900.
> 
> No, it isn't any easier. Its /equally/ easy to install a bundle of 900
> packages of 900 modules as it is to install 5 packages of 900 modules.
> 
> When not installing absolutely everything, but perhaps 'most' things,
> there's the additional benefit that it would be easier to skip a
> particular Bio::module because you didn't want to install its external
> dependencies and weren't that interested in it anyway.
> 
> 
>> - A Bundle or a build file where discrete distributions are listed 
>> (Bio::SearchIO, etc) wouldn't need to be updated every time a new 
>> module is added to a distribution.  I suppose this could be 
>> automated, but why have the additional headache?
> 
> Yes, it would be automated, and no, it wouldn't at all be any kind of
> additional headache. I'm proposing a fully-automated system that the
> pumpkin wouldn't even have to think about it. Much /less/ of a headache
> than dealing with splits. Orders of magnitude easier to deal with.
> 
> 
>> - A chance to cut out some cruft.  We all know that particular areas 
>> need work or a complete overhaul (Restriction, Structure, maybe a few 
>> others).  Smaller, concentrated sets of modules I believe would be 
>> easier to maintain, and those that don't get use will eventually fall 
>> out of favor and may be lost or replaced from the more maintained 
>> group of modules.  Survival of the fittest.
> 
> And the smallest, most concentrated set of modules is the individual
> module.
> 
> 
>> - We already have had practice; bioperl-db, bioperl-run, bioperl-
>> network, and others.  Those that have been routinely maintained and 
>> enjoy wide use (db, run, network) have survived; others not so much 
>> (corba-related stuff, microarray, ext, etc., though the code is still 
>> available if someone else wants to take it up and revive it!).
> 
> The reason some of these existing splits (micoarray, ext) have fallen by
> the way-side? /Because/ they're splits. If they had been part of
> bioperl-live all along, they'd have been kept in a working, compatible
> state and would have been released along with everything else in 1.5.2
> 
> 
>> Disadvantages of a defined split:
>>
>> - The initial headache of identifying which groups go where, 
>> coordinating with those who rely on bioperl (GMOD, etc) on how this 
>> will be set up, so on...
> 
> No need to worry about this with individual modules.
> 
> 
>> - Separate groups of modules require testing together to ensure 
>> functionality is consistent and maintained (something I think you 
>> pointed out previously).
> 
> No need to worry.
> 
> 
>> - I think an increased possibility of branching is possible.
>>
>> - Extra headaches for devs, who have to keep track of the various 
>> critical distributions and make sure they work well together.
> 
> No headaches.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGg3EKczuW2jkwy2gRAriiAJ47Qz9jTshEXuaG0XMYrUTI0hHqAwCeL45r
r/BykCKbM9lqJM0khARuEms=
=NB4B
-----END PGP SIGNATURE-----