[Bioperl-l] bioperl reorganization

Fri Jul 17 17:01:14 UTC 2009

Will try to weigh in more, a little bit of stream of consciousness to  
let you know I'm thinking about it.  Tough summer to focus much on this.

It's too bad we are apparently the laughing stock of Perl gurus, but  
it would be great to see how to modernize aspects of the development.

I'm curious how it will work that we'll have dozens of separate  
distros that we'll have a hard time keeping track of what directory  
things are in? Will there have to be a master list of what version and  
what modules are in what distro now?

When I do a SVN (or git) checkout do I need to checkout each of these  
in its own directory?  Or will there be a master packaging script that  
makes the necessary zip files for CPAN submission?  If they are in  
separate directories are we organizing by conceptual topic  
(phylogenetics, alignment, database search) or by namespace of the  
modules? Do all the 'database' modules live together - probably not  -  
so do we name bioperl-db-remote bioperl-db-local-index, bioperl-db- 
local-sql, etc?  really bioperl-db is somewhat focused on sequences  
and features, but what about things that integrate multiple data types  
- like biosql?

If they are in separate directories, what about all the test data that  
might be shared, is this replicated among all the sub-directories -  
how do we do a good job keeping that up to date, could we have a test- 
data distro instead with symlinks within SVN?

For some other obvious modules that can be split off and self- 
contained, each of these could be a package.  I would estimate more  
than 20 packages depending on how Bio::Tools are carved up.
- I think Bio::DB::SeqFeature needs to be split off for sure this is a  
nice logical peeling off.  Could be another test case since it is a  
Gbrowse dependancy.
-  Bio::DB::GFF as well for the same reasons.
-  Bio::PopGen - self contained for the most part, but depends on  
Bio::Tree and Bio::Align objects
-  Bio::Variation
-  Bio::Map and Bio::MapIO
-  Bio::Cluster and Bio::ClusterIO
-  Bio::Assembly
- Bio::Coordinate

My nightmare is that we're going to have to manage a lot of 'use XX  
1.01' enforcing version requiring when dealing with the dependancies  
on the interface classes and having to keep these all up to date?  The  
version was implicit when they are all part of the same big distro.

Also the splits need not only include one namespace if need be I guess  
but we have generally grouped things by namespace.

What do you want to do about the bioperl-run.  Do we make a set of  
parallel splits from all of these?  I think at the outset we need to  
coordinate the applications supported here in some sort of loose  
ontology - the namespaces were not consistently applied so we have  
some alignment tools in different directories, etc.  So the namespace  
sort of classifies them but it could be better.  One of the challenges  
of multiple developers without a totally shared vision on how it  
should be done.

I'm not convinced that the Bio::Graphics splitoff has been painless so  
we should take stock of how that is working.

It seems like this split off would be a way to better streamline  
things in bioperl so that modern versions of bioperl might be able to  
better interface with things like Ensembl again too.

How much of this effort is worth triaging on the current code versus  
the efforts we want to make on a cleaner, simpler bioperl system that  
appears to scare so many users (and potential developers) off.

Okay I rambled, hope that was helpful.

-jason
--
Jason Stajich
jason at bioperl.org
On Jul 17, 2009, at 2:08 AM, Robert Buels wrote:

> Chris Fields wrote:
>> Yes, I agree.  However a large set of modules in bioperl were  
>> effectively donated by the author, so they will fall to the core  
>> devs to maintain by sheer property of legacy.
>
> This is a very sticky point.  The only way I can think of would be  
> to have each distro have a "principal maintainer", that is the go-to  
> guy for issues related to keeping it running, but can beg and cajole  
> others to help.  At least there will be fewer problems per  
> distribution, since they would be smaller.  If a maintainer has to  
> stop, he has to find somebody else to do it, or the package sits  
> there and bit rot sets in. That's just how it goes.  If it's  
> important enough (like if it's depended on by a dist that IS  
> maintained), somebody will pick it up.
>
>> On bugs:
> <snip>
>> On API and the 'chicken-or-egg' issue:
> <snip>
>> What I would like is have the various breakaway Bio::* either fall  
>> back to Module::Build if Bio::Root::Build isn't present, or just  
>> use Module::Build.  My suggestion is to just use Module::Build  
>> directly, but we could scale down Bio::Root::Build to respect the  
>> Module::Build API (thus allowing it as a fallback).
> I'm not sure about this, I'm not an expert on the ins and outs  of  
> subclassing Module::Build.
>
> One idea I do have, however, is that we might think about using an  
> xt/ directory for intensive and network-based tests that are not  
> meant to be run by automated installers, which could help simplify  
> the test and build code.  I've heard that this is a pretty common  
> practice in other projects.
>
> =====================
>
> Anyway, let's develop some concrete plans. I would say that the plan  
> at http://www.bioperl.org/wiki/Proposed_core_modules_changes is a  
> half-measure, in light of the successful (painless?) Bio::Graphics  
> extraction.
>
> Here's a new proposal:
>
> 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all  
> the current Bioperl modules as dependencies (or however it works)
>
> 2.) start repeating the same extraction procedure used with  
> Bio::Graphics:
>  * identify a candidate set of modules in bioperl-live to be  
> extracted into their own distribution, propose the extraction on the  
> mailing list, get some kind of agreement
>  * make a new component in the svn repository (alongside the bioperl- 
> live and other dirs) named something like Bio-Something-Something,  
> with trunk/, branches/, and tags/ subdirs.
>  * svn cp modules into the new trunk/lib/, tests into trunk/t,  
> scripts into trunk/scripts, and write a Build.PL just like the one  
> Lincoln wrote for Bio::Graphics.
>  * when the extracted copy looks good, use svn merge to port any  
> changes that happened in trunk to the new extracted modules if  
> necessary and test.
>  * delete the old copy from bioperl-live/trunk.
>  * identify a new candidate set of modules, propose on the mailing  
> list, and repeat
>
> 2.5) continue releasing 1.6.X bugfix releases while this is going on.
>
> 3.) when bioperl-live is down to a truly reasonable core set, (fewer  
> than 10 modules might be a good target), rename it to Bio-Perl-Core,  
> go through a round of testing, and push them all to CPAN at once.  
> Task::BioPerl will have dependencies on the module names, I think,  
> so it will continue to install the same from users' perspectives, it  
> will just be downloading different dists.
>
> 4.) repeat steps 1-3 with bioperl-run, and maybe others.
>
> Thoughts?  If people like it, I or somebody else could put it on the  
> wiki.
>
> And of course, I volunteer to put in a lot of work on this.  I'll  
> try to see if I can identify some other likely extraction candidates  
> as a preliminary step and report back to the list.
>
> Also we need some more people besides just me and Chris talking and  
> thinking about this, these are large reshufflings being proposed.
>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org