[Bioperl-l] new directions

Jason Stajich jason@chg.mc.duke.edu
Wed, 7 Mar 2001 11:45:20 -0500 (EST)

So very happy to have 0.7 out.  I know there are some minor issues that
have begun to be resolved, once these reach a suitable number or enough
time has passed, we can think about a point release.  Not for at least 3
weeks though.

The branching gives us a chance to take stock and look at where we want to
go next.  Interest has been expressed in expanding outside of the sequence
analysis realm bioperl has pretty much occupied.  I'm all for it.  The new
projects I hint at below should go on the main trunk, only bug fixes, and
minor feature changes should go on the branch.  We're probably
flexible here so when in doubt we can discuss on the list.  

I'd like to throw some ideas out there and encourage people on the list
who maybe haven't felt comfortable jumping in while we were churning on
the release to think about picking up a project. Especially if any of
these (or your own project ideas) scratch a particular itch you have.  
Some of these don't have to be part of bioperl-live but can be sattelite
projects which utilize the bioperl core objects.

These are just some ideas I have bouncing around, perhaps you have your
own ideas and would like to contribute:

This is also in wiki at
http://www.bioperl.org/wiki/html/BioPerl/BioperlProjects.html - so any
critiques or additions could be added there as well, just CC the list so
we know to check.

 o perl is not an ideal language for doing something like huge microarray
   clustering, but it is ideal for dealing with formatting issues.
   Perhaps code that can deal with converting different microarray formats
   would be helpful.
 o Expansion into other expression data, code to help link expression data
   for genes (sometimes unknown genes) to available information in IGI,
   NCBI Unigene, etc.  All in software so that it can be automated.

 o The Blast issues.  I think the pluggable features to BPlite would be
   ideal, I don't know how well it will work ( wanting to parse more or
   less of the report -- runtime plugging of 'adaptors'?) . I like the
   html features of Bio::Tools::Blast.  What about parsing NCBI Blast XML?
 o Fasta parsing.  We should find a way to support this, either with a
   formal grammar or just some perl code.

 o Speaking of grammars, what about a grammar for parsing EMBL/Genbank?
   Would this be more/less efficient?  We seem kind of kludgy in parts of
   the feature table parsing and it has gotten pretty heavy down there,
   are there ways to simplify this code?

 o Bio::Index::Blast which can read fetch ( and store?) seqs from a blast
 o Map data - genetic, RH maps and their markers.  Adopting code for
   manipulating this information.  A simple ePCR parser would fit in here

 o visualization - perhaps visualization is best done in java, but the
   bioperl-gui modules provide a nice way to look at a sequence with
   annotation. Is there interest in a png/gif/ps renderer as well,
   adopting existing code -- perhaps something similar to gff2ps.

 o Tree drawing - plugging into a PHYLIP or something similar to provide
   some nice drawings of phylogenetic tress.

Jason Stajich
Center for Human Genetics
Duke University Medical Center