[Biojava-l] Re: IndexedSequenceDB

Mon, 15 Oct 2001 11:09:41 +0100

Hi...

On Mon, Oct 15, 2001 at 01:33:35AM -0700, SAMEER MOHTA wrote:
> 
> Topic i am talking about is IndexedSequenceDB. I
> didn't get what exactly the significance of
> IndexedSequenceDB.
> Does it mean that Database which is available in
> FlatFile form, will be indexed by this process.If so,
> then i didn't understood the process properly.
> To be more precise, i want to know how to create
> database, what is the input for database creation,
> where exactly we use this database, how client program
> access this database.Does this DB meant for searching
> some sequences.

Yes, IndexedSequenceDB is an implementation of BioJava's
standard SequenceDB interface.  The idea is to take a
collection of pre-existing flatfiles (in any format which
can be handled by the BioJava seq.io APIs), and generate
an index (normally then stored on disk as an additional
flatfile) which allows fast retrieval of any single sequence.

The basic use pattern is:

  - Construct an IndexedSequenceDB.

  - Make one or more calls to the addFile method, to index files
    of sequences.

  - Use standard SequenceDB methods (e.g. getSequence) to retrieve
    sequences from the database.

The only slight complication is that the IndexedSequenceDB uses
another object to actually store the index data.  The `standard'
implementation is TabIndexStore, which implements a simple index,
stored as a flatfile on disk.

For some simple code which makes full use of an IndexedSequenceDB,
look in the demos/seq/db directory.  There are a number of little
utilities:

  CreateIndex
  AddFilesToIndex
  FetchSequence
  ListSeqsInIndex

[If you downloaded a binary release of BioJava, you might not
have the demos.  They are included in the source releases, or you
can check out the latest source code from the BioJava CVS repository.]

Hope this helps,

    Thomas.