[Bioperl-l] BP split progress and rationale

Wed Jun 1 05:06:01 UTC 2016

All,

I've made some significant progress towards a BP split. I know there 
have been several tries, but I'm willing to take this one to an 
actionable endpoint with YAPC::NA 2016 as a goal date for action.

I have built a graph of all the module dependencies (parent-child and 
horizontal) in Neo4j, and have been using this to design module 
groupings that encompass functional areas and also have hierarchical 
group dependencies such that the dependencies between groups are 
minimized. I'm calling the groupings "packages".

I am using the loose convention that "monophyletic" packages (groups of 
modules that fall within a namespace) are named after the namespace, and 
"polyphyletic" packages are named "BioPerl::<functional name>". The 
following packages are currently pretty solid. The descriptions indicate 
mainly what is encompassed by the contained modules, not rules for 
membership.

BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., 
many Bio::*I, Bio::Factory::*, Build helper classes.)

BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do 
without annotations (e.g., fasta)

BioPerl::Alignment - alignment objects and parsers

BioPerl::Annotation - most annotation modules

BioPerl::SeqFeature - most SeqFeature modules

BioPerl::Tree - most Tree related modules

BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces

BioPerl::Search - The blast parsing and tiling

There are quite a few more. Examples of the logic: BioPerl::Base 
contains all of its dependencies. BioPerl::Sequence requires only 
BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment 
requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires 
Base, Sequence, and SeqFeature. And so on.

With a structure like this, a user who just needs Bio::PrimarySeq and 
Bio::SeqIO to read some fasta files can get away with installing 
BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to 
the full 805 modules, including that broadly useful one 
"Bio::DB::HIV::HIVQueryHelper".

Once finished, I'll propose setting many of the namespaces free as 
separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. 
These can be packaged with their appropriate BioPerl::* prerequisites in 
the metadata. I expect this will allow natural selection to operate much 
more efficiently on the obsolete modules.

I will set up CPAN::Meta compliant metadata for everything.

I have more thoughts but this is already too long.

MAJ