[Bioperl-l] BP split progress and rationale

Mark A. Jensen maj at fortinbras.us
Wed Jun 1 13:49:35 UTC 2016


Wow, Brian,
"Generally people install BioPerl to get IO and basic functionality"? 
Generally, people (or, I) wouldn't think of installing BioPerl for basic 
functionality, because people (or I) get 805 modules, most obsolete, in 
order to use 3, after waiting 15-20min for the tests to complete. At 
least, I've sensed significant frustration in many of the posts relating 
to installation on this list. I agree, everything should be geared 
toward simplicity and efficiency, but for the user.

The base set would always be installed. The installation of the 
sequence set would pull in the base set. There is no need to divide the 
repos, this can all be driven by metadata - in CPAN::Meta format, so 
that any CPAN distribution tool could actually pick out what is 
necessary for a particular user's needs and install them. The bloat is 
managed by managing the groupings, not the repositories. Sure, there 
would maintenance and documentation, same as in living projects. Maybe 
new people would get interested if the work could be divided among many 
functional units. And maybe the unused hundreds of modules would whither 
as they should. Or, maybe you're right, time for BioPerl to ride into 
Valhalla.


On 2016-06-01 08:37, Brian Osborne wrote:
> Mark,
>
> I don’t understand. Last year I put Bio::Root* back into
> bioperl-live, to simplify installation. Now we are splitting again?
>
> IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be
> separate. Generally people install BioPerl to get IO and basic
> Sequence object functionality. Why would Bio::Root (always required)
> be separate from things like Bio::Seq and SeqIO (always requested)?
>
> Simplicity, please. BioPerl has very few people actively engaged
> these days, and the numbers there are steadily dropping. Everything 
> we
> do should be geared towards simplicity and efficiency. Another
> example: SeqFeature and Annotation. Why separate them? They are 
> almost
> always used together.
>
> Then there’s the maintenance, and documentation. Please don’t take
> this personally MAJ, this business about splitting everything up is 
> an
> old idea, an unquestioned assumption. Time to re-consider it.
>
> Brian O.
>
>
>
>> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <maj at fortinbras.us> 
>> wrote:
>>
>> All,
>>
>> I've made some significant progress towards a BP split. I know there 
>> have been several tries, but I'm willing to take this one to an 
>> actionable endpoint with YAPC::NA 2016 as a goal date for action.
>>
>> I have built a graph of all the module dependencies (parent-child 
>> and horizontal) in Neo4j, and have been using this to design module 
>> groupings that encompass functional areas and also have hierarchical 
>> group dependencies such that the dependencies between groups are 
>> minimized. I'm calling the groupings "packages".
>>
>> I am using the loose convention that "monophyletic" packages (groups 
>> of modules that fall within a namespace) are named after the 
>> namespace, and "polyphyletic" packages are named "BioPerl::<functional 
>> name>". The following packages are currently pretty solid. The 
>> descriptions indicate mainly what is encompassed by the contained 
>> modules, not rules for membership.
>>
>> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., 
>> many Bio::*I, Bio::Factory::*, Build helper classes.)
>>
>> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can 
>> do without annotations (e.g., fasta)
>>
>> BioPerl::Alignment - alignment objects and parsers
>>
>> BioPerl::Annotation - most annotation modules
>>
>> BioPerl::SeqFeature - most SeqFeature modules
>>
>> BioPerl::Tree - most Tree related modules
>>
>> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>>
>> BioPerl::Search - The blast parsing and tiling
>>
>> There are quite a few more. Examples of the logic: BioPerl::Base 
>> contains all of its dependencies. BioPerl::Sequence requires only 
>> BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment 
>> requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires 
>> Base, Sequence, and SeqFeature. And so on.
>>
>> With a structure like this, a user who just needs Bio::PrimarySeq 
>> and Bio::SeqIO to read some fasta files can get away with installing 
>> BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to 
>> the full 805 modules, including that broadly useful one 
>> "Bio::DB::HIV::HIVQueryHelper".
>>
>> Once finished, I'll propose setting many of the namespaces free as 
>> separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. 
>> These can be packaged with their appropriate BioPerl::* prerequisites 
>> in the metadata. I expect this will allow natural selection to operate 
>> much more efficiently on the obsolete modules.
>>
>> I will set up CPAN::Meta compliant metadata for everything.
>>
>> I have more thoughts but this is already too long.
>>
>> MAJ
>>
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list