[Bioperl-l] GSoC/BioPerl Reorganization Project

Thu Apr 28 21:04:51 UTC 2011

Sounds fine; I think (as you indicate) we can deal with issues along the way.  Rob, anything to add?

chris

On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:

> Chris,
> 
> We haven't talked much about the versioning yet, but it will be on the list to figure out asap. 
> 
> So far, the plan is to split out Bio::Root first, followed by a couple modules that depend only on Bio::Root. The plan I proposed was Bio::Das, Bio::Event then Bio::Location. Depending on how much time is remaining for the GSoC project, the next to split out would be Bio::Factory and Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan to still help with the reorganization after the internship is over, but I obviously have to have a stopping point for the GSoC project. 
> 
> Rob provide me with a really nice scrip to list dependencies of the modules, so I plan to make a roadmap towards to end of the summer that will help guide the rest of the reorganization. At that point, we'll have to deal with the circular dependencies carefully.
> 
> This is a huge project, much bigger than I can do in one summer. But I plan to get it started in a way that makes it easy for others to contribute. 
> 
> Sheena 
> 
> 
> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields <cjfields at illinois.edu> wrote:
> Sheena,
> 
> Congrats on being accepted! We've talked about doing this over the years, but it's not an easy task and it needs a dedicated project to get the ball rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a few of my questions/thoughts (Rob could probably chime in as well, but I think his general thoughts on the project parallel mine):
> 
> 1) The current BioPerl CPAN could just be a simple install script, acting like a 'Task' or 'Bundle' module, installing the actual Bio-specific distributions.  Doing it this way would allow you to iteratively split off additional code but retain the original Task/Bundle-based approach to installation.  For instance, the first pass could split out Root, then have a dependency-light and 'extras' distribution, 2nd round split further based on function, and so on:
> 
>  1st round (v 1.9)   :  BioPerl (just an installer) -> installs root, min-deps, extra-deps
>  2nd round (v 1.901) :  BioPerl (just an installer) -> root, seq/feature, other-min-deps, extra-deps
>  ...
>  Xth round (v 1.99)  :  BioPerl (just an installer) -> root, tools, seq, tree, align, coord, map, everything-else
>  ...
> 
> Also, one could potentially install modules in various ways: interactively, in predetermined groups, using a user-defined list, etc (one could effectively create custom BioPerl installs for GBrowse or other tools for instance).  Of course I would only pick the easiest route to start, but maybe that gives some ideas.  Regardless, if the dependency tree is set up correctly any reliance on other Bio* modules would be defined in the various Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
> 
> 2) The Bio::Root modules are probably the true core modules and are the most stable with regards to changes, so those could be moved to something like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've discussed this on-list before, but it's appropriate to bring this up again)
> 
> 3) How do we want to handle versioning?  We can't (and probably shouldn't) release everything on a synchronized versioning scheme (via Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I can foresee each split-off dist having it's own version, with the BioPerl network of modules being in effect it's own mini-CPAN.
> 
> 5) Related to versioning, in my opinion we should maybe aim on eventually calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme.  Lincoln has already done something like this with Bio::Graphics, which was originally part of BioPerl but split off prior to v 1.6.0.
> 
> 6) In some cases I can see particularly thorny problems, such as circular dependencies.  I can think of a few ways to address that (creating a simple lightweight Bio::Species class as a fallback if Bio::Tree code isn't present, for instance), but any additional thoughts on this would be helpful.
> 
> 7) Do we want to set up something like 'git submodule' for the devs to pull down all BioPerl-relevant code?
> 
> Other thoughts?
> 
> chris
> 
> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> 
> > Hey everyone,
> >
> > I wanted to take a minute to introduce myself as one of the Google Summer of
> > Code interns. I was the lucky one chosen to work on the BioPerl
> > Reorganization (*crowd cheers*). I am a grad student in bioinformatics, and
> > somewhat new to this level of programming so bear with me as I learn the
> > technical jargon. Luckily I have both Rob and Chris to mentor me this
> > summer!
> >
> > Reading through the mailing list archives, I see there have been many
> > discussion and differing opinions about tackling this project. Given the
> > time frame for GSoC and my limited experience, there is no way I will
> > complete this project on my own but I will at least be able to start it,
> > which will hopefully motivate others to pitch in. So far, the plan for the
> > GSoC project is to start by breaking out Bio::Root, followed by a couple
> > other modules based on their dependencies and the time allowed. Each will be
> > published to CPAN independently. You can follow the project (once it starts)
> > on github at https://github.com/sheenams.
> >
> > I look forward to collaborating with many of you on the reorganization (hint
> > hint)!
> >
> > Sheena
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>