[Bioperl-l] BP split progress and rationale

Fields, Christopher J cjfields at illinois.edu
Wed Jun 1 21:46:16 UTC 2016


I think the Bio-Root split indicated there is definitely a certain threshold of pain involved w/ splitting out code.  The key reason I suggested adding it back was from a maintenance and user standpoint; it was a pain and probably unnecessary as the initial step into splitting out code (it did work, but at some cost).  

Saying that, I think there is a good middle-ground. A key complaint about bioperl is the installation process and the ton of dependencies for modules that see little use.  We work around these to some extent with ‘recommended’ dependencies, but it’s not the best solution in my opinion.  Maybe we should just hone in on the modules that have these additional downstream dependencies stifling installation and move them out, with the mind on keeping dependencies to an absolute minimum?  We already know what these modules are (e.g. the Build.PL file lists the dependencies).  

As a note: there has been some work on this already: Bio::SearchIO::blastxml resides in a separate repo now.  I would also suggest we keep Bio::Coordinate and a few other modules split out, there have been very few complaints.

chris

> On Jun 1, 2016, at 8:57 AM, Brian Osborne <bosborne11 at verizon.net> wrote:
> 
> MAJ,
> 
> There must be something I’m not understanding, so let’s start this over. When I put Bio::Root back into bioperl-live last year, the only feedback I got was good, that splitting BioPerl into many parts was probably not a good idea. The balance may be in finding the correct number of parts, yes?
> 
> So - again, just so I understand - are you proposing to take Bio::Root out of bioperl-live again?
> 
> BIO
> 
> 
> 
>> On Jun 1, 2016, at 9:49 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> 
>> Wow, Brian,
>> "Generally people install BioPerl to get IO and basic functionality"? Generally, people (or, I) wouldn't think of installing BioPerl for basic functionality, because people (or I) get 805 modules, most obsolete, in order to use 3, after waiting 15-20min for the tests to complete. At least, I've sensed significant frustration in many of the posts relating to installation on this list. I agree, everything should be geared toward simplicity and efficiency, but for the user.
>> 
>> The base set would always be installed. The installation of the sequence set would pull in the base set. There is no need to divide the repos, this can all be driven by metadata - in CPAN::Meta format, so that any CPAN distribution tool could actually pick out what is necessary for a particular user's needs and install them. The bloat is managed by managing the groupings, not the repositories. Sure, there would maintenance and documentation, same as in living projects. Maybe new people would get interested if the work could be divided among many functional units. And maybe the unused hundreds of modules would whither as they should. Or, maybe you're right, time for BioPerl to ride into Valhalla.
>> 
>> 
>> On 2016-06-01 08:37, Brian Osborne wrote:
>>> Mark,
>>> 
>>> I don’t understand. Last year I put Bio::Root* back into
>>> bioperl-live, to simplify installation. Now we are splitting again?
>>> 
>>> IMO Bio::Base/Bio::Root and Bio::Seq*/Bio::SeqIO* should never be
>>> separate. Generally people install BioPerl to get IO and basic
>>> Sequence object functionality. Why would Bio::Root (always required)
>>> be separate from things like Bio::Seq and SeqIO (always requested)?
>>> 
>>> Simplicity, please. BioPerl has very few people actively engaged
>>> these days, and the numbers there are steadily dropping. Everything we
>>> do should be geared towards simplicity and efficiency. Another
>>> example: SeqFeature and Annotation. Why separate them? They are almost
>>> always used together.
>>> 
>>> Then there’s the maintenance, and documentation. Please don’t take
>>> this personally MAJ, this business about splitting everything up is an
>>> old idea, an unquestioned assumption. Time to re-consider it.
>>> 
>>> Brian O.
>>> 
>>> 
>>> 
>>>> On Jun 1, 2016, at 1:06 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>>> 
>>>> All,
>>>> 
>>>> I've made some significant progress towards a BP split. I know there have been several tries, but I'm willing to take this one to an actionable endpoint with YAPC::NA 2016 as a goal date for action.
>>>> 
>>>> I have built a graph of all the module dependencies (parent-child and horizontal) in Neo4j, and have been using this to design module groupings that encompass functional areas and also have hierarchical group dependencies such that the dependencies between groups are minimized. I'm calling the groupings "packages".
>>>> 
>>>> I am using the loose convention that "monophyletic" packages (groups of modules that fall within a namespace) are named after the namespace, and "polyphyletic" packages are named "BioPerl::<functional name>". The following packages are currently pretty solid. The descriptions indicate mainly what is encompassed by the contained modules, not rules for membership.
>>>> 
>>>> BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., many Bio::*I, Bio::Factory::*, Build helper classes.)
>>>> 
>>>> BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do without annotations (e.g., fasta)
>>>> 
>>>> BioPerl::Alignment - alignment objects and parsers
>>>> 
>>>> BioPerl::Annotation - most annotation modules
>>>> 
>>>> BioPerl::SeqFeature - most SeqFeature modules
>>>> 
>>>> BioPerl::Tree - most Tree related modules
>>>> 
>>>> BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces
>>>> 
>>>> BioPerl::Search - The blast parsing and tiling
>>>> 
>>>> There are quite a few more. Examples of the logic: BioPerl::Base contains all of its dependencies. BioPerl::Sequence requires only BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires Base, Sequence, and SeqFeature. And so on.
>>>> 
>>>> With a structure like this, a user who just needs Bio::PrimarySeq and Bio::SeqIO to read some fasta files can get away with installing BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to the full 805 modules, including that broadly useful one "Bio::DB::HIV::HIVQueryHelper".
>>>> 
>>>> Once finished, I'll propose setting many of the namespaces free as separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. These can be packaged with their appropriate BioPerl::* prerequisites in the metadata. I expect this will allow natural selection to operate much more efficiently on the obsolete modules.
>>>> 
>>>> I will set up CPAN::Meta compliant metadata for everything.
>>>> 
>>>> I have more thoughts but this is already too long.
>>>> 
>>>> MAJ
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 




More information about the Bioperl-l mailing list