Bioperl: ISMB-98 Minutes & Summary

Steve A. Chervitz sac@genome.stanford.edu
Fri, 10 Jul 1998 11:28:47 -0700 (PDT)


Greetings Bioperlers.

This message contains a summary of the first informal Bioperl gathering at 
the recent ISMB-98 meeting in Montreal (28 June-2 July). The purpose of 
this Bioperl meeting was three-fold:

  1) to allow people interested in using Perl for bioinformatics to meet,
  2) to introduce new people to the Bioperl project, and
  3) to allow current Bioperl developers to discuss details about ongoing 
     Bioperl projects.

This post is intended to generate discussion. Direct your comments to this
mailing list. Thanks.

Steve Chervitz
sac@genome.stanford.edu

=======================================================================

Turnout
-------

By good planning and a bit of luck six of the main 
developers/contributors were present (Steve Brenner, Steve Chervitz, 
Georg Fuellen, Chris Dagdigian, Ian Korf, and Ewan Birney).
The Bioperl lunch on 1 July attracted 40 ISMB attendees who added their 
names, e-mail, and comments to sign-up sheets, thus indicating 
significant interest in Bioperl. We've recently added these folks to the 
Bioperl mailing list(s).

Interest in Bioperl
-------------------

The majority of attendees at the lunch could be described as end-users 
but a sizable number (8-10) expressed interest in contributing code or 
design ideas. A "show of hands" survey indicated that many would be 
interested in a Bioperl workshop at the next ISMB meeting at 
http://ismb99.gmd.de/. More planning is necessary to see how feasible 
this would be. 

Other Relevations
-----------------

Talking to people at the lunch revealed:

 1) Strong interest in the sequence object, specifically, a need for a 
    feature description mechanism. The possibility of having a light-weight
    and a heavy-weight sequence object was introduced.
 2) Substantial interest in a 3D structure object.
 3) Need for a more formal mechanism for how to contribute to Bioperl. 
    Bioperl can be considered a test site before submitting to CPAN. 

Steve Chervitz was named as Bioperl chief coordinator.


Summary of Technical Discussion
-------------------------------

The following is a summary of an e-mail discussion among several of the
main Bioperl developers after the meeting. 

It is important to establish a set of core modules. Once in place, they 
can be used & extended for a variety of purposes by a variety of people.
The main players working on the core modules are (this list is NOT 
exclusive):

   * Steve Brenner - interface to Perl people, overview of object design,
                     structure object
   * Steve Chervitz - BLAST + (general db searching?) objects, 
                      structure object
   * George Fullen - Alignment object, Phylogeny object
   * Chris Dagdigian - Sequence object (light weight)
   * Ian Korf - Entry object + sub objects (a.k.a. Heavy weight sequence
     object).
   * Ewan Birney - Alignment routines (XS C-compiled).

Other responsibilities look like:

   * Steve Chervitz - the bioperl 'leader' + organiser
   * Steve Brenner - talking to Perl people (eg, getting bio.perl.org)
   * Chris Dagdigian - bio.perl.org sys admin
   * Ewan Birney - QA on core modules

Things to do now are:

   * Discuss what should or shouldn't be in the core
   * Discuss whether there should be a single Bio::Object we inherit from
     or not
   * What to do with exceptions
   * Documentation style
   * Gather the core set of objects up and put them under cvs for the main
     developers to edit
   * Establish bio.perl.org website


* Discuss what should or shouldn't be in the core
-------------------------------------------------
Core objects: 
   Sequence 
   Entry (heavyweight sequence object) which has Feature objects
   Alignment 
   Alignment engines
   DB searching abstraction
   Sequence database indexing
   Structure
 
Sequence = lightweight sequence + name + start + end + accession + description
           and *nothing* else 

Entry    = Sequence Object + FeatureSet Object (= exons, blast
           hits etc). 

Alignment = Storage of a multiple alignment + basic manipulations

Alignment Engine = Construction of an Alignment from two+ sequences

DB searching abstraction = BLAST, FastA, etc 

Structure  = Provide the same information as a PDB file.
             Higher-level functionality provided by associated modules.

Possible additional core modules:
   * Parsing multiple sequence files.
   * Sequence database indexing (sequence ID -> seq or entry object)


* Discuss whether there should be a single Bio::Object we inherit from 
----------------------------------------------------------------------
Voted not to do this for the core modules due to the increased dependency 
this would create. Having a common Bio::Object may be useful for 
non-core modules such as Gene, Protein, Chromosome, etc. 

There is a recognized need to share common data such as sequence alphabets,
characters for gaps and unknown residues, paths to external programs, etc.
A possible solution is to have a shared Bio::Resources.pm module.


* What to do with exceptions
-----------------------------
There is agreement to stick with the built-in exception handling 
capabilities in the standard Perl distribution but opinion is split on 
using die/croak versus warn/carp.

A die/croak can be trapped and handled by calling code; a warn/carp
warns the user something's wrong but the program doesn't realize
anything's wrong. See also this mailing list posting:
http://www.uni-bielefeld.de/mailinglists/BCD/vsns-bcd-perl/9709/0001.html

However, frequent die()ing is not very Perl-like and forces you to put
eval statements in your code. Thus, a common opinion is that one should  
warn/carp except for serious errors (but deciding what is "serious" is 
a bit of a judgement call). 

One solution to this issue that Steve C. has been experimenting with is to
have a object-specific "strict" mode (akin to the "use strict" compiler 
pragma). This would allow different users to decide how rigorous they 
want to be. An object created with a "-strict=>1" parameter would croak 
instead of carp and perhaps generate additional warnings. Conversely, an 
object created with a "-strict=>-1" parameter would carp instead of croak. 
This behavior could be implemented at the function call level as well.

Regardless of what strategy is used, when a function can produce a fatal 
exception (die/croak) this should be documented carefully and eval 
statements should be provided for how to trap such exceptions.

Exception handling in Perl is presently evolving to permit more 
object-orientation and we expect Bioperl modules to evolve in step with Perl 
in this respect.


* Documentation style
---------------------
Established a consistent Bioperl style, ala Steve C's blast module:
http://genome-www.stanford.edu/perlOOP/bioperl/blast/Blast.pod.html


* Gather the core set of objects up and put them under cvs for the main
  developers to edit
-----------------------------------------------------------------------
There are no fences on the code. People focus on one part or 
another but everyone can edit everything... (they won't...). You should 
not feel bad about people editing your code. This is a good sign.

Code review (giving code out to other people) is a good 
thing. Code review is one of the best programmer based QA methods and 
helps people foster cooperative views of the code.

One should be cautious about using such collaborative code in a 
production environment until it has stabilized.


Other comments:
---------------

Monolithic modules: The consesus was that large, monlothic modules, while 
not the best design, seem to be more popular among users than extended 
module hierarchies (e.g., CGI.pm vs. CGI::). However, some of this may be 
due to historical accident: if users are accustomed to a single module
which is subsequently split up, they will be reluctant to transition to 
the new, extended design. If a functionality is offered in a hierarchy 
of modules from the start the hierarchy may be acceptable.

Continue loose ties with VSNS-BCD; it seems to be consensus that there's 
some mutual benefit (e.g., BCD-students may again be assigned bioperl 
coding projects in return for admission.) 



=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================