[Bioperl-l] bioperl reorganization

Lincoln Stein lincoln.stein at gmail.com
Fri Jul 24 13:31:11 UTC 2009

My preference would be to split both Bio::DB::SeqFeature and Bio::DB::GFF
into their own module. I don't think they depend on each other, but I'm not
100% sure!


On Sat, Jul 18, 2009 at 8:23 AM, Scott Cain <cain.cshl at gmail.com> wrote:

> Hi All,
> I don't want to wade in too deeply, but I like the idea of splitting things
> up.  I think the Bio::Graphics split has gone well and has made life easier
> in GBrowse world.  I could see Bio::DB::SeqFeature and Bio::DB::GFF being
> split and either being kept together or going there separate ways (though I
> have a nagging suspicion that SeqFeature code depends on GFF code in a few
> places, so it may make sense to just keep them together.
> And Chris, if it makes you feel any better, I don't think anything you've
> done or not done has held up GBrowse2.
> Scott
> On Jul 17, 2009, at 11:14 PM, Chris Fields wrote:
>  My 2c...
>> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote:
>>  Will try to weigh in more, a little bit of stream of consciousness to let
>>> you know I'm thinking about it.  Tough summer to focus much on this.
>> Yes, for me as well.  That will change soon (approx two weeks) ;>
>>  It's too bad we are apparently the laughing stock of Perl gurus, but it
>>> would be great to see how to modernize aspects of the development.
>>> I'm curious how it will work that we'll have dozens of separate distros
>>> that we'll have a hard time keeping track of what directory things are in?
>>> Will there have to be a master list of what version and what modules are in
>>> what distro now?
>> I don't think we're a laughingstock as much as we haven't had the time to
>> dedicate towards this (and much of this occurred at a point early on, with
>> that whole 'Cathedral and Bazaar' esr-based thingy).  BTW,, those same gurus
>> shouldn't speak: perl core is just as bad and riddled with worse bugs,
>> though rgs and co. wouldn't admit it.
>> In fact, base.pm itself has a nasty one; I'm surprised no one in the
>> bioperl community has noticed it yet (it's listed as a bug on RT I think):
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
>> $Bio::SeqIO::VERSION."\n"'
>> 1.0069
>> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print
>> $Bio::Root::IO::VERSION."\n"'
>> -1, set by base.pm
>> Imported modules do not have VERSION set correctly when it is exported.
>>  This hasn't become an issue in bioperl yet (it's really an edge case), but
>> several devs have run into this. And really, why set VERSION to a string
>> like '-1, set by base.pm'?
>> Anyway, re: versioning, the way I think about it, if we have a small very
>> stable core with version X, and a focused very stable module group with
>> version Y, other distributions would have a separate version and require
>> subgroup version Y (which would in turn require core version X).  CPAN would
>> take care of it.  This isn't much different than what occurs everyday on
>> CPAN anyway (Jay's Catalyst, Moose and MooseX, and so on).  In fact, several
>> Moose-requiring distributions don't require the latest Moose.
>>  When I do a SVN (or git) checkout do I need to checkout each of these in
>>> its own directory?  Or will there be a master packaging script that makes
>>> the necessary zip files for CPAN submission?
>> Not sure; that would be up to us I suppose.  I think it would be easier to
>> maintain and release if they were separate or packaged up as Jay suggests.
>>  If they are in separate directories are we organizing by conceptual topic
>>> (phylogenetics, alignment, database search) or by namespace of the modules?
>> By topic, retaining namespaces.  We have a basic Bio::* directory
>> structure already in place for various generic terms (Tools, DB, etc), so I
>> see this crossing simple namespaces very easily.  And as I pointed out to
>> Robert, several of those could possibly go together.
>>  Do all the 'database' modules live together - probably not  - so do we
>>> name bioperl-db-remote bioperl-db-local-index, bioperl-db-local-sql, etc?
>>>  really bioperl-db is somewhat focused on sequences and features, but what
>>> about things that integrate multiple data types - like biosql?
>> I don't see bioperl-db (BioSQL) being split up.  I think it's too
>> intrinsically linked and cohesive (it's almost a separate core unto itself),
>> so it would be counterproductive to do so.
>> Maybe have bioperl-db become bioperl-biosql.  Web-based =
>> bioperl-remotedb.  Local = bioperl-localdb. OBDA = bioperl-obda.
>>  If they are in separate directories, what about all the test data that
>>> might be shared, is this replicated among all the sub-directories - how do
>>> we do a good job keeping that up to date, could we have a test-data distro
>>> instead with symlinks within SVN?
>> We have to see how much is actually shared and proceed from there.  I
>> would like to eventually resurrect the idea of a separate biodata repo that
>> we could just ftp the data from as needed.  That would cut down on the
>> package size quite a bit, but I'm not sure how feasible that is from the
>> testing point of view (would we have to skip all tests if there were no
>> network access)?
>>  For some other obvious modules that can be split off and self-contained,
>>> each of these could be a package.  I would estimate more than 20 packages
>>> depending on how Bio::Tools are carved up.
>>> - I think Bio::DB::SeqFeature needs to be split off for sure this is a
>>> nice logical peeling off.  Could be another test case since it is a Gbrowse
>>> dependancy
>>> -  Bio::DB::GFF as well for the same reasons.
>> Completely agree (and I think Lincoln would like this as well).
>>  -  Bio::PopGen - self contained for the most part, but depends on
>>> Bio::Tree and Bio::Align objects
>> Could list those as a required dependency.
>>  -  Bio::Variation
>>> -  Bio::Map and Bio::MapIO
>>> -  Bio::Cluster and Bio::ClusterIO
>>> -  Bio::Assembly
>>> - Bio::Coordinate
>>> My nightmare is that we're going to have to manage a lot of 'use XX 1.01'
>>> enforcing version requiring when dealing with the dependancies on the
>>> interface classes and having to keep these all up to date?  The version was
>>> implicit when they are all part of the same big distro.
>> Right.  But it also becomes a maintenance problem when serious bugs in one
>> module impede the needed release of others to CPAN.
>>  Also the splits need not only include one namespace if need be I guess
>>> but we have generally grouped things by namespace.
>>> What do you want to do about the bioperl-run.  Do we make a set of
>>> parallel splits from all of these?  I think at the outset we need to
>>> coordinate the applications supported here in some sort of loose ontology -
>>> the namespaces were not consistently applied so we have some alignment tools
>>> in different directories, etc.  So the namespace sort of classifies them but
>>> it could be better.  One of the challenges of multiple developers without a
>>> totally shared vision on how it should be done.
>> We could split bp-run and Tools, pairing the wrappers with the relevant
>> parsers modules.  Not sure if this can be done with SearchIO as well but it
>> could be tested to see how feasible that would be.
>>  I'm not convinced that the Bio::Graphics splitoff has been painless so we
>>> should take stock of how that is working.
>> Really?  Lincoln has made several fixes lately on CPAN, so I thought
>> everything was going well.  If anything I would think the lack of additional
>> 1.6.x bioperl releases has probably held Gbrowse 2.0 up more due to
>> Bio::DB::SeqFeature (my fault, but as you know life and job take precedence
>> sometimes).
>>  It seems like this split off would be a way to better streamline things
>>> in bioperl so that modern versions of bioperl might be able to better
>>> interface with things like Ensembl again too.
>>> How much of this effort is worth triaging on the current code versus the
>>> efforts we want to make on a cleaner, simpler bioperl system that appears to
>>> scare so many users (and potential developers) off.
>> I say triage away on a branch, but we need to indicate which ones to
>> whittle out first.  The reason I believe we went for a larger split
>> initially (as indicated on the wiki page) was to push something forward and
>> not get too bogged down in the details.  But we may as well go full throttle
>> and do this right away.
>>  Okay I rambled, hope that was helpful.
>>> -jason
>>> --
>>> Jason Stajich
>>> jason at bioperl.org
>> Very, very helpful.  Now I need a beer.
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>

More information about the Bioperl-l mailing list