[Bioperl-l] bioperl reorganization

Fri Jul 24 13:28:15 UTC 2009

Sorry I'm joining this thread so late, but I've been taking a break from
development work.

Since the Bio::Graphics split is being used as an exemplar, I'd like to
share with you how the process went. Overall, it was pretty painless. The
main issue that I encountered was that there was a high-performance
Bio::SeqFeatureI object called Bio::Graphics::Feature that was used both by
Bio::Graphics and by Bio::DB::SeqFeature. This caused a cross-dependency
between Bio::Graphics and BioPerl. When I realized this problem, I proposed
to the mailing list to create a replacement for Bio::Graphics::Feature
called Bio::SeqFeature::Lite that would live in the BioPerl distribution.
When this ideas was OK'd, I  replaced the Bio::Graphics::Feature module in
Bio::Graphics with a shell class that inherited from Bio::SeqFeature::Lite.
With this dependency removed, I was able to lift Bio/Graphics out of BioPerl
and put it in its own repository. Creating the Build.PL and regression tests
were then very straightforward.

I think the whole process took about six hours, spread across two days.

It made my life a whole lot easier to be able to release new versions of
Bio::Graphics independent of the BioPerl distribution. I think many (but not
all) of the BioPerl modules could be handled this way, and that the easiest
way to deal with this is to schedule to extract them singly in a careful,
step-by-step fashion rather than to try to reorganize everything all at
once.

Lincoln

On Fri, Jul 17, 2009 at 1:01 PM, Jason Stajich <jason at bioperl.org> wrote:

> Will try to weigh in more, a little bit of stream of consciousness to let
> you know I'm thinking about it.  Tough summer to focus much on this.
>
> It's too bad we are apparently the laughing stock of Perl gurus, but it
> would be great to see how to modernize aspects of the development.
>
> I'm curious how it will work that we'll have dozens of separate distros
> that we'll have a hard time keeping track of what directory things are in?
> Will there have to be a master list of what version and what modules are in
> what distro now?
>
> When I do a SVN (or git) checkout do I need to checkout each of these in
> its own directory?  Or will there be a master packaging script that makes
> the necessary zip files for CPAN submission?  If they are in separate
> directories are we organizing by conceptual topic (phylogenetics, alignment,
> database search) or by namespace of the modules? Do all the 'database'
> modules live together - probably not  - so do we name bioperl-db-remote
> bioperl-db-local-index, bioperl-db-local-sql, etc?  really bioperl-db is
> somewhat focused on sequences and features, but what about things that
> integrate multiple data types - like biosql?
>
> If they are in separate directories, what about all the test data that
> might be shared, is this replicated among all the sub-directories - how do
> we do a good job keeping that up to date, could we have a test-data distro
> instead with symlinks within SVN?
>
> For some other obvious modules that can be split off and self-contained,
> each of these could be a package.  I would estimate more than 20 packages
> depending on how Bio::Tools are carved up.
> - I think Bio::DB::SeqFeature needs to be split off for sure this is a nice
> logical peeling off.  Could be another test case since it is a Gbrowse
> dependancy.
> -  Bio::DB::GFF as well for the same reasons.
> -  Bio::PopGen - self contained for the most part, but depends on Bio::Tree
> and Bio::Align objects
> -  Bio::Variation
> -  Bio::Map and Bio::MapIO
> -  Bio::Cluster and Bio::ClusterIO
> -  Bio::Assembly
> - Bio::Coordinate
>
> My nightmare is that we're going to have to manage a lot of 'use XX 1.01'
> enforcing version requiring when dealing with the dependancies on the
> interface classes and having to keep these all up to date?  The version was
> implicit when they are all part of the same big distro.
>
> Also the splits need not only include one namespace if need be I guess but
> we have generally grouped things by namespace.
>
> What do you want to do about the bioperl-run.  Do we make a set of parallel
> splits from all of these?  I think at the outset we need to coordinate the
> applications supported here in some sort of loose ontology - the namespaces
> were not consistently applied so we have some alignment tools in different
> directories, etc.  So the namespace sort of classifies them but it could be
> better.  One of the challenges of multiple developers without a totally
> shared vision on how it should be done.
>
> I'm not convinced that the Bio::Graphics splitoff has been painless so we
> should take stock of how that is working.
>
> It seems like this split off would be a way to better streamline things in
> bioperl so that modern versions of bioperl might be able to better interface
> with things like Ensembl again too.
>
> How much of this effort is worth triaging on the current code versus the
> efforts we want to make on a cleaner, simpler bioperl system that appears to
> scare so many users (and potential developers) off.
>
> Okay I rambled, hope that was helpful.
>
> -jason
> --
> Jason Stajich
> jason at bioperl.org
>
> On Jul 17, 2009, at 2:08 AM, Robert Buels wrote:
>
>  Chris Fields wrote:
>>
>>> Yes, I agree.  However a large set of modules in bioperl were effectively
>>> donated by the author, so they will fall to the core devs to maintain by
>>> sheer property of legacy.
>>>
>>
>> This is a very sticky point.  The only way I can think of would be to have
>> each distro have a "principal maintainer", that is the go-to guy for issues
>> related to keeping it running, but can beg and cajole others to help.  At
>> least there will be fewer problems per distribution, since they would be
>> smaller.  If a maintainer has to stop, he has to find somebody else to do
>> it, or the package sits there and bit rot sets in. That's just how it goes.
>>  If it's important enough (like if it's depended on by a dist that IS
>> maintained), somebody will pick it up.
>>
>>  On bugs:
>>>
>> <snip>
>>
>>> On API and the 'chicken-or-egg' issue:
>>>
>> <snip>
>>
>>> What I would like is have the various breakaway Bio::* either fall back
>>> to Module::Build if Bio::Root::Build isn't present, or just use
>>> Module::Build.  My suggestion is to just use Module::Build directly, but we
>>> could scale down Bio::Root::Build to respect the Module::Build API (thus
>>> allowing it as a fallback).
>>>
>> I'm not sure about this, I'm not an expert on the ins and outs  of
>> subclassing Module::Build.
>>
>> One idea I do have, however, is that we might think about using an xt/
>> directory for intensive and network-based tests that are not meant to be run
>> by automated installers, which could help simplify the test and build code.
>>  I've heard that this is a pretty common practice in other projects.
>>
>> =====================
>>
>> Anyway, let's develop some concrete plans. I would say that the plan at
>> http://www.bioperl.org/wiki/Proposed_core_modules_changes is a
>> half-measure, in light of the successful (painless?) Bio::Graphics
>> extraction.
>>
>> Here's a new proposal:
>>
>> 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all the
>> current Bioperl modules as dependencies (or however it works)
>>
>> 2.) start repeating the same extraction procedure used with Bio::Graphics:
>>  * identify a candidate set of modules in bioperl-live to be extracted
>> into their own distribution, propose the extraction on the mailing list, get
>> some kind of agreement
>>  * make a new component in the svn repository (alongside the bioperl-live
>> and other dirs) named something like Bio-Something-Something, with trunk/,
>> branches/, and tags/ subdirs.
>>  * svn cp modules into the new trunk/lib/, tests into trunk/t, scripts
>> into trunk/scripts, and write a Build.PL just like the one Lincoln wrote for
>> Bio::Graphics.
>>  * when the extracted copy looks good, use svn merge to port any changes
>> that happened in trunk to the new extracted modules if necessary and test.
>>  * delete the old copy from bioperl-live/trunk.
>>  * identify a new candidate set of modules, propose on the mailing list,
>> and repeat
>>
>> 2.5) continue releasing 1.6.X bugfix releases while this is going on.
>>
>> 3.) when bioperl-live is down to a truly reasonable core set, (fewer than
>> 10 modules might be a good target), rename it to Bio-Perl-Core, go through a
>> round of testing, and push them all to CPAN at once. Task::BioPerl will have
>> dependencies on the module names, I think, so it will continue to install
>> the same from users' perspectives, it will just be downloading different
>> dists.
>>
>> 4.) repeat steps 1-3 with bioperl-run, and maybe others.
>>
>> Thoughts?  If people like it, I or somebody else could put it on the wiki.
>>
>> And of course, I volunteer to put in a lot of work on this.  I'll try to
>> see if I can identify some other likely extraction candidates as a
>> preliminary step and report back to the list.
>>
>> Also we need some more people besides just me and Chris talking and
>> thinking about this, these are large reshufflings being proposed.
>>
>> Rob
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Jason Stajich
> jason at bioperl.org
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>