[Dynamite] BOSC 2000

Ian Holmes ihh@fruitfly.org
Fri, 7 Jul 2000 21:40:42 -0700 (PDT)


I am submitting an abstract to BOSC to promote Telegraph. I'll handle the
talk and everything, but if you'd like to scan it over to make sure I'm
not promising the moon on a stick, here it is (it's due in tomorrow so
REPORT CHANGES WITHOUT DELAY!!):

"Telegraph: Free Probabilistic Sequence Analysis"

Hidden Markov Models (HMMs) have been used successfully for a wide
range of applications in bioinformatics including protein domain
classification, signal peptide recognition and gene prediction, among
others.  New developments such as Fisher kernels suggest that HMMs
have plenty more to offer and that many ideas remain unexplored;
however, despite keen interest from many computational biologists in
trying out new kinds of HMM architecture for novel problems, such
experiments have been hindered by the lack of any free software
packages capable of full-blown probabilistic manipulations of
arbitrarily-structured HMMs at the speeds demanded by biological
database searches.

It is educational to examine previous Open Source projects in this
general area.  Notable contributions include HMMER, the most popular
of the HMM programs, specialising in linear profiling; Dynamite, a
versatile package but one without probabilistic training algorithms;
and the BioJava HMM suite.  All of these have achieved their present
feature-rich status after many rounds of changes, in response to
public demand.  This is the strength of Open Source, and is why Open
Source is particularly appropriate for a library that's intended from
the beginning to be used for a wide range of applications.

Telegraph is an Open Source project that picks up where Dynamite left
off.  Reflecting the project's early beginnings (it started as a
collaboration between the EnsEMBL group in Cambridge and the Berkeley
Drosophila Genome Project), Telegraph's primary goal was to duplicate
the functionality of Dynamite (including high-speed search capacity
for a broad range of finite state machine architectures, from the
Smith-Waterman algorithm right up to GeneWise) while providing
additional features that might be expected by the inquisitive
computational biologist interested in experimenting with novel kinds
of HMM.  On the mathematical side, the extra features provided by
Telegraph include library routines for training (Baum-Welch with
Dirichlet mixture priors), calculation of Fisher scores, alignment
sampling, posterior probability evaluation and other capabilities
designed to facilitate likelihood calculus.  On the computational
side, the object model reflects extensive programming experience with
a wide variety of HMMs, with formalistic improvements on Dynamite such
as making code-generation optional and introducing a well-defined XML
format for specifying model architectures and algebraic relationships
between parameters.

Eric Raymond, in his classic Open Source manifesto "The Cathedral And
The Bazaar", points out that successful Open Source projects begin not
with a ready-made developer community but with a "plausible
promise... that [the project] can be evolved into something really
neat".  Telegraph is fortunate in that this potential developer
community is already well-identified: we (the current developers) hope
that the project will attract exactly the same kinds of people who
have already contributed to making Dynamite such a success.  So far,
Telegraph's design goals have closely mirrored Dynamite's existing
functionality.  Ewan Birney (the original author of Dynamite) is a
core designer and developer and the design simplifies on Dynamite
wherever possible, with improvements restricted to "essential"
algorithms such as training.

The BOSC and ISMB 2000 conferences coincide with Telegraph's "going
public", and we strongly hope that potential developers, users and
testers will contact us during or after the meetings.  As an
incentive, there will be a repeat of Dynamite's now-legendary offer of
a bottle of champagne for the best bugfinder at each release, with the
first bottle of bubbly going out at Christmas 2000.

As the number of biologists interested in experimenting with novel
HMMs rises ever more rapidly, while the computational demands of the
latest algorithms exceed the capabilities of any one centre, the
potential for an interoperable "algorithm exchange format" such as
Telegraph is great.  We very much hope that you will be interested in
becoming involved in any capacity, be it as a user or a developer, and
that you'll contact one of the following Telegraph co-ordinators:

  Ian Holmes  -- ihh@fruitfly.org
  Guy Slater  -- guy@ebi.ac.uk
  Ewan Birney -- birney@ebi.ac.uk