[Biojava-l] extra seqDB things to add [off-topic]
Thomas Down
td2@sanger.ac.uk
Fri, 14 Jul 2000 12:15:34 +0100
On Thu, Jul 13, 2000 at 06:05:36PM -0400, Triplett, Terry wrote:
>
> Incidentally, the Cloudscape database that is included in the Sun j2sdkee
> (Java 2 SDK Enterprise Edition) seems to fit the same niche, and appears to
> be free to use. I haven't used it yet so don't know if it is a full version
> or cripple/demoware. If its presence can be relied upon, it might be worth
> using in biojava.
In principle, I like Cloudscape, simply on the basis that SQL
tends to be nicer to work with than learning Yet Another Database
API (tm). But I can't find out much about Cloudscape licencing.
There's a free download of a `developer kit' from their home page,
but they imply that there's some kind of licencing fee if you
distribute it. Looks like we'd best steer clear of using this
in the core.
Myself, I rather like the idea of a plain text index format.
Just something like
# Automatically generated index: do not modify
$name = "My Amazing Sequences"
$fileFormat = "fasta"
# seqName seqFile startByte endByte
MYSEQ1 /seqs/ms0001.fa 0 1765
MYSEQ2 /seqs/ms0001.fa 6573 9876
# ...etc...
A format like this could be easily supported by a wide variety
of platforms (at least in principle), and I don't believe that
it will have too much speed penalty -- there will be some cost
for parsing the index file the first time it's accessed, but otherwise
it should be fast.
Alternatively, an XML-ized version of the index might be nice,
and would make adding extra fields to the records a more robust
process, but there would be an extra overhead on parsing the
indexes.
Thomas.
--
He looked up with big brown eyes. ``They're really only
tiny little A-bombs, honest.''
-- David Brin.