[Bioperl-l] BOSC 2001 bioperl report
Jason Stajich
jason@chg.mc.duke.edu
Mon, 23 Jul 2001 08:15:21 -0400 (EDT)
Here are some minutes and a report of stated goals for bioperl with regard
to bioperl 1.0 release.
Report from Bioperl Developer meeting at BOSC 2001
* 0.9.x developer release series will begin releases in about 3
weeks. These will be non-stable release where all tests pass, but
known bugs may be included. These APIs in the 0.9.x series are
not considered stable and users must accept this risk when
developing applications with this series. The advantage of these
releases over pure CVS checkouts is an FTP tarball and guarantee
that all tests pass.
* Preparations for the 1.0 release are underway with expected
release date late part of Q4.
Below is the checklist for 1.0 - bioperl members currently
responsible for an item are listed in parentheses, if no one is
listed we need someone to volunteer.
o Alignment objects - Abstraction of interface for Alignments
based on SimpleAlign, and removal of UnivAln object.
(Heikki Lehvaslaiho)
o AnnotationI - An interface for describing general purpose annotations
(Ewan Birney)
Perhaps building Bibliography Reference objects based on work by
Martin Senger and interface with his BQS ideas.
o ApplicationFactoryI - Interface for running applications from
within bioperl. We need an abstract definition for running
applications (basing on Novella, openBSA). The intention is to
define this in such a way that applications can be summarized by
metadata and not require the creation of 1 class per application
ala StandAloneBlast,TCoffee, Clustal.
First effort will be to try and inteface with EMBOSS package.
Additional efforts to interface with Phrap, phred, consed and
rewrap blast, netblast, clustal, and tcoffee interfaces.
Heikki has already gotten started on the EMBOSS interface and it
looks very promising.
(David Block to propose interface)
o Assembly tools - handling Consed, phred parsing and phrap
interfaces working with ideas and code proposed by Chad Masala
and connecting to (new) BioCORBA Comparison objects.
(Chad Masala to help start)
o Sequence Parsers (SeqIO) - We do not plan to do many changes to
the parsing to insure a stable release. However one idea
proposed by Hilmar Lapp is to eliminate the hardcoded nature of
Sequence and Feature creation of the SeqIO system. Instead of
the harcoded object name in the FTHelper code to create
Bio::SeqFeature::Generic objects a SeqFeatureFactory object can
be passed in to the SeqIO parser on initialization to set where
seqfeatures are created from. Similarly a SeqFactory should
also be defined to be used to create empty sequence objects
which are initialized by the parser code instead of the
hardcoded creation of Bio::Seq objects in all the SeqIO parser
modules.
(?)
o Semantic Feature interpreter factory which will take a tree of
features and output a tree of features, interpreting them and
creating groupings and new SeqFeatures where appropriate based
on the feature tags. The best example of this is to interpret
the primary tags in a tree of SeqFeatures and build gene objects
if one finds CDS, exon, mRNA tags.
(Team PBI Saskatoon - David Block and Mark Wilkinson)
o Expression Data - some bioperl members have mentioned they have
objects for expression data which will make their way into
bioperl core.
o Evaluate that SeqFeature::GeneStructure object is complete.
(Hilmar Lapp and Mark Wilkinson)
o FASTA analysis parser - building similarity pairs and complying
to the SeqAnalysisParserI interface.
(Dyfed)
o Maps & Markers - handling marker maps, connecting markers with
Variation package, and representing these in a database.
(Heikki Lehvaslaiho, Jason Stajich, Lincoln Stein)
o Pedigree data - managing and manipulating pedigree data and
interfacing with genotype and haplotype data. Connecting with a
database for storing these objects.
(Heikki Lehvaslaiho and Jason Stajich)
o Implement BioCORBA 0.03 proposed spec as described by the
"Copenhagen Core" at BOSC2001.
o Teaching tools - building teaching tutorials and basic
problem sets for introducing bioperl to new users.
(Peter Schattner and Jason Stajich)
* Additionally we discussed the creation of new CVS modules. I have
been liberal, creating modules based on project domains, however
the bioperl core discussed and proposed that we instead create
modules specific to external dependancies. So bioperl-live would
hold pure perl code, bioperl-db would hold database specific code
(SQL dependancy), bioperl-ext would hold c-extensions, bioperl-gui
graphical dependancies (perlTk).
If anyone has input on this, please respond to the list as this is
just a proposal.
* Open calling for scripts. We will open a public directory for
submitting perl scripts and a short description of them. Everyone
is encouraged to donate perl scripts which may or may not use
bioperl to solve biological problems in your work. These scripts
will be re-written by a bioperl developer and distributed as
either part of the scripts or examples directory of bioperl
(depending on their utility as general purpose or just as an
example).
Finally this script writing has 2 intended purposes. We wish to
include more people developing for bioperl and these will be
small, self-contained projects giving new developers a chance to
work on a problem without feeling swallowed by the whole bioperl
object model. Additionally this will also help provide a large
number of utilities for bioinformatics and address the needs of
users who download bioperl and only see a library of code with no
applications.
* Post-1.0 ideas
o Sequence Parsing - Event based parsing and utilizing grammars.
o Other ideas should be submitted to the list and we'll help keep
track.
Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/