[Bioperl-l] GSoC/BioPerl Reorganization Project

Thu Apr 28 23:19:51 UTC 2011

I think you guys are on the right track, here are some slightly more 
detailed plans.  I'll use Chris's subject numbering.

1,2,3,5.) I envision the splitting algorithm going like this:

      no strict; # this is pseudocode!

      my $split_count = 0;
      for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) {

          - take $subsystem modules and tests out of bioperl-live

            (my $new_dist_name = $subsystem) =~ s/::/-/g;
          - extract $subsystem modules into new dist called
            $new_dist_name.  Make sure all its tests pass, and write
            some more tests if necessary.

          - add dep on $subsystem to bioperl-live/Build.PL

          - push $new_dist_name and bioperl-live to CPAN.
            $new_dist_name has version '2.000', and bioperl-live has
            version "1.7.$split_count".
      }

      and then, at the end of this loop, bioperl-live will be
      nothing but a Build.PL and a couple of other things
      for backcompat, like Bio::Root::Version, Bio::Perl, etc.

      Important things to notice about this algorithm are that, at each
      step in the loop:

         a.) For users that install bioperl with CPAN,
             doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will
             get you the same set of modules as before the split
             started, with the split-off modules at 2.000 versions, and
             the non-split-off ones at 1.7.x versions.

         b.) For users (not developers) that are git cloning
             bioperl-live, even though they are naughty (wink), they
             can do 'perl Build.PL; ./Build installdeps' to get the
             split-off modules, downloaded like any other CPAN
             dependency.  There may be some lag before the split-off
             thing is downloadable from CPAN,

         c.) For BioPerl developers, unless they are working on a
             certain module, they should install the split-off modules
             from CPAN like everybody else, and git clone only the piece
             they are working on.

         d.) The version of bioperl-live keeps increasing by 0.001 with
             each split.  The systems that are split off have a 2.x
             version number, each slightly different depending on when it
             was split off.  After this point, their release schedules
             and version numbers are independent of eachother and of
             bioperl-live.  For Bio::Perl and Bio::Root::Version, the
             things that stay in bioperl-live, installing the latest
             version will get you all the split-off modules.

6.) (thorny circular dependencies and stuff)  Those will become quickly 
apparent as this process proceeds.  They'll take some finesse and/or 
ruthlessness and/or hacking to get around.  We'll burn those bridges as 
we come to them.

7.) (git submodules) Git submodules probably won't be necessary, since 
at each step in the process BioPerl devs can use ./Build installdeps or 
cpanm --installdeps .  to install whatever the dependencies are for the 
piece they are working on, whether it's bioperl-live (in the case of a 
module that has not yet been split off), or one of the distributions 
that has already been split off (in which case their improvements will 
probably be releasable to CPAN immediately!).

Lots of detail there.  I tried to make it structured and easy to skim 
though.  Thoughts?

Rob

On 04/28/2011 02:04 PM, Chris Fields wrote:
> Sounds fine; I think (as you indicate) we can deal with issues along the way.  Rob, anything to add?
>
> chris
>
> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
>
>> Chris,
>>
>> We haven't talked much about the versioning yet, but it will be on the list to figure out asap.
>>
>> So far, the plan is to split out Bio::Root first, followed by a couple modules that depend only on Bio::Root. The plan I proposed was Bio::Das, Bio::Event then Bio::Location. Depending on how much time is remaining for the GSoC project, the next to split out would be Bio::Factory and Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan to still help with the reorganization after the internship is over, but I obviously have to have a stopping point for the GSoC project.
>>
>> Rob provide me with a really nice scrip to list dependencies of the modules, so I plan to make a roadmap towards to end of the summer that will help guide the rest of the reorganization. At that point, we'll have to deal with the circular dependencies carefully.
>>
>> This is a huge project, much bigger than I can do in one summer. But I plan to get it started in a way that makes it easy for others to contribute.
>>
>> Sheena
>>
>>
>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>  wrote:
>> Sheena,
>>
>> Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine):
>>
>> 1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions.  Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation.  For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on:
>>
>>   1st round (v 1.9)   :  BioPerl (just an installer) ->  installs root, min-deps, extra-deps
>>   2nd round (v 1.901) :  BioPerl (just an installer) ->  root, seq/feature, other-min-deps, extra-deps
>>   ...
>>   Xth round (v 1.99)  :  BioPerl (just an installer) ->  root, tools, seq, tree, align, coord, map, everything-else
>>   ...
>>
>> Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance).  Of course I would only pick the easiest route to start, but maybe that gives some ideas.  Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
>>
>> 2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've discussed this on-list before, but it's appropriate to bring this up again)
>>
>> 3) How do we want to handle versioning?  We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN.
>>
>> 5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme.  Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0.
>>
>> 6) In some cases I can see particularly thorny problems, such as circular dependencies.  I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful.
>>
>> 7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code?
>>
>> Other thoughts?
>>
>> chris
>>
>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
>>
>>> Hey everyone,
>>>
>>> I wanted to take a minute to introduce myself as one of the Google Summer of
>>> Code interns. I was the lucky one chosen to work on the BioPerl
>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and
>>> somewhat new to this level of programming so bear with me as I learn the
>>> technical jargon. Luckily I have both Rob and Chris to mentor me this
>>> summer!
>>>
>>> Reading through the mailing list archives, I see there have been many
>>> discussion and differing opinions about tackling this project. Given the
>>> time frame for GSoC and my limited experience, there is no way I will
>>> complete this project on my own but I will at least be able to start it,
>>> which will hopefully motivate others to pitch in. So far, the plan for the
>>> GSoC project is to start by breaking out Bio::Root, followed by a couple
>>> other modules based on their dependencies and the time allowed. Each will be
>>> published to CPAN independently. You can follow the project (once it starts)
>>> on github at https://github.com/sheenams.
>>>
>>> I look forward to collaborating with many of you on the reorganization (hint
>>> hint)!
>>>
>>> Sheena
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>