[Biopython-dev] biopython on github

Leighton Pritchard lpritc at scri.ac.uk
Tue Mar 17 09:59:32 UTC 2009


Hi all,

This has been an occasionally frustrating thread to read...

On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> 2009/3/16 Tiago Antão <tiagoantao at gmail.com>
>> 

>>> How is the "official" biopython trunk controlled? Currently what is on
>>> CVS is the gospel and Peter and Michiel essencially have control of
>>> what is there and what is labelled as a "biopython distribution". How
>>> will this work now?
 
>> In a distributed workflow, there is no technical official repository. The
>> "official repository" is socially enforced.

That was true before.  Unless I misread the Biopython licencing, there was
no real barrier to putting a branched copy of the code on your own
server/site, with your own modifications.  What git does is provide tools to
make merging of that sort of code easier (along with a number of of other
nice features, such as authentication of contributions).  The presence of
git does not ensure that your changes, or anyone else's, will be merged with
any other repository, and nor does it ensure the quality of contributed
code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.

To an extent, the 'official' repository is, pragmatically, the one that is
most stable and well-tested.  If my hypothetical branched version had become
more stable and widely-used than the 'official' trunk, and become the most
frequently downloaded and implemented, and received new contributions in its
own right, it might then be considered de facto 'the distribution'; nasty
online spats with the original authors notwithstanding.  The 'social
enforcement' of politeness (i.e. *I* don't take credit for *your* work)
prevents this to an extent, as it ought to under any versioning system.

There's a competing tendency to consider that the coders who spent the most
time creating the code understand it the best, and are in the best position
to maintain it directly.  This is true to a large degree, and entirely
applicable to Biopython's contributed modules.  git can potentially
facilitate that sort of contribution to the 'official' trunk in a way that
CVS can't, due to its permissions bottleneck.  However, the mechanics of
incorporating that contributed code are more or less the same: the people
with control of the 'official' trunk review the code and decide whether to
include it.  This is true whether the code is submitted as a patch to
Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
forked git repository.  The same is true of your own git repository - you
don't have to include someone else's forked code if you don't want to.

What possibly needs to change is not the version control system, but the way
in which people think about their contribution.  Contributions can be made
productively under any versioning system, and the key questions remain the
same in all cases: Does the new code work (are there tests)? Does the new
code break any old code?  Is there documentation?  Is the API consistent?

"What version control system are we using?" is a minor detail, unless it is
inherently broken, hinders any of the above, or causes some other
deal-breaking issue (for Linus Torvalds, this included speed issues for
merges).

>> I think Michiel and Peter still head the Biopython project--at
>> least they have the most clout, I would say. Therefore, we will probably
>> look to one of their branches as the "official" branch of Biopython. When
>> one of them wants to step down in duty, we will socially pass the torch on
>> to the next taker.

It has always been thus.  Now, instead of passing on the user authentication
to the CVS server at OBF, the user authentication to the biopython github
account will be passed on, instead:

> I think it is essential we have a clearly labeled official trunk
> (perhaps with branches for releases), which will be used for all the
> official releases (tar balls, zip files and windows installers).  Our
> main webpage should make this very clear.
> 
> We could potentially continue to have a shared official branch (e.g.
> belonging to the generic github biopython user), and give all the
> existing CVS contributors write access - and continue to manage this
> as before.  So for example, if Frank wanted to check in some minor
> changes to Bio.Nexus he could just do it.  Future contributors
> patches/branches might get taken up by a developer on a personal
> branch for testing, before being merged into the official branch.
> 
> i.e. We can initially continue as before - right now I don't have a
> feel for how much work the role of an official branch maintainer would
> be, and it is difficult to guess without more hands on experience
> using the new tools.
 
Plus ca change (avec git)...

>>> The second question, related to the first is how will different
>>> branches (of different persons) be managed? I am seeing people
>>> starting working on the same code in different directions and then
>>> having problems merging everything together.
>> 
>> People are supposed to work in different directions; this is the point of
>> distributed workflows.

I may have a different understanding of 'different directions' than you
mean, but I don't think that it's good for a community project if people
work in different directions.  I also don't think that that is the point of
distributed workflows; on the contrary, I think that they are intended to
make it easier to work independently towards a common goal.  Even if that is
by working on loosely- or non-interacting parts of the whole.

>> Merging tends not to be so difficult, and compared to
>> centralized models like CVS and SVN, it's a cinch. We will help provide
>> documentation for proper merging habits (e.g., merge early, merge often, and
>> no rebasing after pushing, etc.). There are also screencasts popping up (in
>> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
>> we will link to for educational purposes.
>> And of course, other developers will be around to help out in tricky merges.

This characterises one of the frustrating aspects of this thread (not
getting at you personally, Chris) - the occasional implicit assumption that
'things will be inherently *better* if we use git'.  Developers are around
to help now, even using CVS (which also has clear, long-standing stable
documentation - and even an O'Reilly book).  Several people don't seem to
think that that - and the way that code is reviewed and incorporated into
the main distribution - is good enough, and I don't think that this will
change just because the version control system has changed.  Nor will
changing revision control system generate significant free time to write,
test and document code.  But we may have the recession to do that last one
for us.

> Well, yes, in theory we have the same problem now with CVS - and while
> the tools may make merging easier, some communication is essential
> when working on the key modules which impact large parts of the code
> base.

I would put it more strongly than that: communication is essential in all
aspects of the project.  A number of related blog posts make statements
along the lines of "I don't use Biopython, or post to the mailing lists, but
I think that they're doing *this* wrong", or "I submitted code, but it
didn't get taken up immediately".  Now, venting and ranting on a blog is
fine, but it's not really *communicating*, any more than it was when I
thought that the BioSQL GenBank upload code was broken, fixed it (for my
purposes) and told no-one.  Git won't change the communication issue (in
either direction) any more than it changes the code review process.

FWIW, I think that git looks like a good way to go, and that it could help
encourage people to make local modifications of Biopython for their own
benefit and in their own interests and expert area, in a way that is visible
to the core distribution (unlike the patch submission process that is now
implemented).  In that way it could facilitate more rapid expansion of the
core distribution.  However, the bottlenecks of ensuring code quality,
testing and documentation will only ease if that is taken up by the
individuals/groups making those contributions, in addition to the core
developers.

And yes, I know I'm late with the new GenomeDiagram docs... ;)

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________




More information about the Biopython-dev mailing list