[Biojava-l] extra seqDB things to add [off-topic]

Thomas Down td2@sanger.ac.uk
Fri, 14 Jul 2000 12:15:34 +0100


On Thu, Jul 13, 2000 at 06:05:36PM -0400, Triplett, Terry wrote:
> 
> Incidentally, the Cloudscape database that is included in the Sun j2sdkee
> (Java 2 SDK Enterprise Edition) seems to fit the same niche, and appears to
> be free to use.  I haven't used it yet so don't know if it is a full version
> or cripple/demoware.  If its presence can be relied upon, it might be worth
> using in biojava.

In principle, I like Cloudscape, simply on the basis that SQL
tends to be nicer to work with than learning Yet Another Database
API (tm).  But I can't find out much about Cloudscape licencing.
There's a free download of a `developer kit' from their home page,
but they imply that there's some kind of licencing fee if you
distribute it.  Looks like we'd best steer clear of using this
in the core.

Myself, I rather like the idea of a plain text index format.
Just something like

  # Automatically generated index: do not modify
  $name = "My Amazing Sequences"
  $fileFormat = "fasta"

  # seqName     seqFile            startByte      endByte
  MYSEQ1        /seqs/ms0001.fa   0              1765
  MYSEQ2        /seqs/ms0001.fa   6573           9876
  # ...etc...

A format like this could be easily supported by a wide variety
of platforms (at least in principle), and I don't believe that
it will have too much speed penalty -- there will be some cost
for parsing the index file the first time it's accessed, but otherwise
it should be fast.

Alternatively, an XML-ized version of the index might be nice,
and would make adding extra fields to the records a more robust
process, but there would be an extra overhead on parsing the 
indexes.

Thomas.
-- 
He looked up with big brown eyes.  ``They're really only
tiny little A-bombs, honest.''
                                     -- David Brin.