[Bioperl-l] GSoC/BioPerl Reorganization Project

Sheena Scroggins sheena.scroggins at gmail.com
Thu Apr 28 19:53:49 UTC 2011


We haven't talked much about the versioning yet, but it will be on the list
to figure out asap.

So far, the plan is to split out Bio::Root first, followed by a couple
modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
Bio::Event then Bio::Location. Depending on how much time is remaining for
the GSoC project, the next to split out would be Bio::Factory and
Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I plan
to still help with the reorganization after the internship is over, but I
obviously have to have a stopping point for the GSoC project.

Rob provide me with a really nice scrip to list dependencies of the modules,
so I plan to make a roadmap towards to end of the summer that will help
guide the rest of the reorganization. At that point, we'll have to deal with
the circular dependencies carefully.

This is a huge project, much bigger than I can do in one summer. But I plan
to get it started in a way that makes it easy for others to contribute.


On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Sheena,
> Congrats on being accepted! We've talked about doing this over the years,
> but it's not an easy task and it needs a dedicated project to get the ball
> rolling, so to speak.  Hopefully this isn't tl;dr.  I'll start off with a
> few of my questions/thoughts (Rob could probably chime in as well, but I
> think his general thoughts on the project parallel mine):
> 1) The current BioPerl CPAN could just be a simple install script, acting
> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
> distributions.  Doing it this way would allow you to iteratively split off
> additional code but retain the original Task/Bundle-based approach to
> installation.  For instance, the first pass could split out Root, then have
> a dependency-light and 'extras' distribution, 2nd round split further based
> on function, and so on:
>  1st round (v 1.9)   :  BioPerl (just an installer) -> installs root,
> min-deps, extra-deps
>  2nd round (v 1.901) :  BioPerl (just an installer) -> root, seq/feature,
> other-min-deps, extra-deps
>  ...
>  Xth round (v 1.99)  :  BioPerl (just an installer) -> root, tools, seq,
> tree, align, coord, map, everything-else
>  ...
> Also, one could potentially install modules in various ways: interactively,
> in predetermined groups, using a user-defined list, etc (one could
> effectively create custom BioPerl installs for GBrowse or other tools for
> instance).  Of course I would only pick the easiest route to start, but
> maybe that gives some ideas.  Regardless, if the dependency tree is set up
> correctly any reliance on other Bio* modules would be defined in the various
> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
> 2) The Bio::Root modules are probably the true core modules and are the
> most stable with regards to changes, so those could be moved to something
> like BioPerl-Core.  Beyond that, what are the proposed splits?  (we've
> discussed this on-list before, but it's appropriate to bring this up again)
> 3) How do we want to handle versioning?  We can't (and probably shouldn't)
> release everything on a synchronized versioning scheme (via
> Bio::Root::Version, for instance), that'll quickly fall apart.  Personally I
> can foresee each split-off dist having it's own version, with the BioPerl
> network of modules being in effect it's own mini-CPAN.
> 5) Related to versioning, in my opinion we should maybe aim on eventually
> calling this BioPerl v2.0 and starting with a simpler X.Y versioning scheme.
>  Lincoln has already done something like this with Bio::Graphics, which was
> originally part of BioPerl but split off prior to v 1.6.0.
> 6) In some cases I can see particularly thorny problems, such as circular
> dependencies.  I can think of a few ways to address that (creating a simple
> lightweight Bio::Species class as a fallback if Bio::Tree code isn't
> present, for instance), but any additional thoughts on this would be
> helpful.
> 7) Do we want to set up something like 'git submodule' for the devs to pull
> down all BioPerl-relevant code?
> Other thoughts?
> chris
> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
> > Hey everyone,
> >
> > I wanted to take a minute to introduce myself as one of the Google Summer
> of
> > Code interns. I was the lucky one chosen to work on the BioPerl
> > Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
> and
> > somewhat new to this level of programming so bear with me as I learn the
> > technical jargon. Luckily I have both Rob and Chris to mentor me this
> > summer!
> >
> > Reading through the mailing list archives, I see there have been many
> > discussion and differing opinions about tackling this project. Given the
> > time frame for GSoC and my limited experience, there is no way I will
> > complete this project on my own but I will at least be able to start it,
> > which will hopefully motivate others to pitch in. So far, the plan for
> the
> > GSoC project is to start by breaking out Bio::Root, followed by a couple
> > other modules based on their dependencies and the time allowed. Each will
> be
> > published to CPAN independently. You can follow the project (once it
> starts)
> > on github at https://github.com/sheenams.
> >
> > I look forward to collaborating with many of you on the reorganization
> (hint
> > hint)!
> >
> > Sheena
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list