[Biojava-l] indexdb.Record implementation

Tue Dec 9 05:34:05 EST 2003

>>>>> "Eckhard" == Eckhard Lehmann <ecky.l at gmx.de> writes:

    Eckhard> Hi, I need to create a quick & dirty database from a
    Eckhard> Genbank File, which contains many entries. It seems that
    Eckhard> I can do this with
    Eckhard> org.biojava.bio.program.indexdb.IndexTools.indexGenbank(...).

Yes, that's correct.

    Eckhard> But how can I get the Entries by ID, once the index is
    Eckhard> created? It seems that the following works:

    Eckhard> BioStore bst = new BioStore(new
    Eckhard> java.io.File("/path/to/indexdir"), false); Record rec =
    Eckhard> bst.get("id_of_genbank_enty");

BioStore is part of the OBDA indexing framework. It should not be
necessary to create one yourself. From BioStore docs:

"BioStores represent directory and file structures which index flat
files according to the OBDA specification. The preferred method of
constructing new instances is to use BioStoreFactory."

For more information on this see http://obda.open-bio.org/ and in
the biojava release see docs/howto/BIODATABASE-ACCESS-HOWTO.txt
and docs/howto/FLAT-DATABASES-HOWTO.txt

    Eckhard> But Record is an interface and therefore without the
    Eckhard> implementation I would like to have (the implementation
    Eckhard> to read out the desired Genbank entry and e.g. have it as
    Eckhard> a Sequence object) .

I can see what you are getting at... the Record interface only
describes byte offsets and length - it does not have any
responsibility for understanding the file format.

    Eckhard> Are there somewhere implementations in biojava-1.30 for
    Eckhard> processing these Standard Records - resp. is there
    Eckhard> another way to do it without the need to extract the
    Eckhard> record from the file by parsing the byte-oriented RAF
    Eckhard> that one can get by rec.getFile()?

One way is to set up a .bioinformatics config file (see the OBDA docs
referenced above) and use the applications org.biojava.app.BioFlatIndex
and org.biojava.app.BioGetSeq

For a quick/dirty solution, you can go straight for a flat database
without using the OBDA database organisation services. Examples are in
the unit tests (see package org.biojava.bio.program.indexdb in the
tests tree).

e.g.

"location" is a String filename of the directory which will contain
the index files:

  public void testIndexGenbankDNA() throws Exception
  {
      File [] files = getDBFiles(new String [] { "part1.gb",
                                                 "part2.gb" });
      IndexTools.indexGenbank("test", new File(location),
                              files, SeqIOConstants.DNA);

      SequenceDBLite db = new FlatSequenceDB(location, "genbank");

      Sequence seq1 = db.getSequence("A16SRRNA");
      assertEquals(1497, seq1.length());
      Sequence seq2 = db.getSequence("A16STM112");
      assertEquals(1346, seq2.length());
      Sequence seq3 = db.getSequence("A16STM146");
      assertEquals(1352, seq3.length());

      Sequence seq4 = db.getSequence("AY080928");
      assertEquals(557, seq4.length());
      Sequence seq5 = db.getSequence("AY080929");
      assertEquals(556, seq5.length());
      Sequence seq6 = db.getSequence("AY080930");
      assertEquals(557, seq6.length());
  }

hth

Keith

-- 

- Keith James <kdj at sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -