[Biojava-l] New Wiki page

Paul Edlefsen pedlefsen@systemsbiology.org
Wed, 07 Feb 2001 11:01:04 -0800


Speaking of who's doing what, I was considering writing an implementation of
SymbolList that takes a nibble (or maybe a byte) per DNA base instead of a
word.  I've got this code in C++ and thought I'd port it over, though I
haven't yet begun.

Is anybody else working along similar lines?  I need to read in multimegabase
sequences and just 35Megabase Human chr.22 is too much for the current
implementation, even increasing the heap to 128Megs.  (This makes sense:  35 M
bases * 4 bytes/base > 128 M bytes).

Our goal is to make some open tools for whole-genome analysis and
cross-species comparison.  35 Megabases is just the tip of the iceberg: to
defend biojava to my peers I need to demonstrate that it can handle big
sequences.

:Paul

PS This is my first correspondence with the list, so I'll introduce myself.
The Institute for Systems Biology (http://www.systemsbiology.org) is a
Seattle-based nonprofit academic institution for interdisciplinary molecular
biology and biotechnology.  I am a computer programmer (not a biologist
(yet?)) in the Computational Biology group.  If anyone on this list is in the
US Northwest, let's have lunch.

I've been working on a C++ bio-toolkit ala biojava, etc. that can use
Paracel's Genematcher or just a local search, though it is nowhere near ready
for public scrutiny.  I've been asked to make some quick-and-dirty
visualization tools in Java, which has brought me to biojava.  Thanks to
biojava (and Jazz -- check out http://www.cs.umd.edu/hcil/jazz/), I made the
prototype in 3 days!  Y'all have done a fine job, and I look forward to
contributing to the effort.

--
Paul T. Edlefsen  Software Engineer
<<<<<<<<<<<<<<<<  >>>>>>>>>>>>>>>>>
Computational Biology Group
The Institute for Systems Biology
4225 Rooosevelt Way NE, Suite 200
Seattle, WA 98105
pedlefsen@systemsbiology.org
<<<<<<<<<<<<<<<<  >>>>>>>>>>>>>>>>>
Phone: (206)732-1336
Fax:   (206)732-1299