[Biojava-l] indexdb.Record implementation
Keith James
kdj at sanger.ac.uk
Tue Dec 9 05:34:05 EST 2003
>>>>> "Eckhard" == Eckhard Lehmann <ecky.l at gmx.de> writes:
Eckhard> Hi, I need to create a quick & dirty database from a
Eckhard> Genbank File, which contains many entries. It seems that
Eckhard> I can do this with
Eckhard> org.biojava.bio.program.indexdb.IndexTools.indexGenbank(...).
Yes, that's correct.
Eckhard> But how can I get the Entries by ID, once the index is
Eckhard> created? It seems that the following works:
Eckhard> BioStore bst = new BioStore(new
Eckhard> java.io.File("/path/to/indexdir"), false); Record rec =
Eckhard> bst.get("id_of_genbank_enty");
BioStore is part of the OBDA indexing framework. It should not be
necessary to create one yourself. From BioStore docs:
"BioStores represent directory and file structures which index flat
files according to the OBDA specification. The preferred method of
constructing new instances is to use BioStoreFactory."
For more information on this see http://obda.open-bio.org/ and in
the biojava release see docs/howto/BIODATABASE-ACCESS-HOWTO.txt
and docs/howto/FLAT-DATABASES-HOWTO.txt
Eckhard> But Record is an interface and therefore without the
Eckhard> implementation I would like to have (the implementation
Eckhard> to read out the desired Genbank entry and e.g. have it as
Eckhard> a Sequence object) .
I can see what you are getting at... the Record interface only
describes byte offsets and length - it does not have any
responsibility for understanding the file format.
Eckhard> Are there somewhere implementations in biojava-1.30 for
Eckhard> processing these Standard Records - resp. is there
Eckhard> another way to do it without the need to extract the
Eckhard> record from the file by parsing the byte-oriented RAF
Eckhard> that one can get by rec.getFile()?
One way is to set up a .bioinformatics config file (see the OBDA docs
referenced above) and use the applications org.biojava.app.BioFlatIndex
and org.biojava.app.BioGetSeq
For a quick/dirty solution, you can go straight for a flat database
without using the OBDA database organisation services. Examples are in
the unit tests (see package org.biojava.bio.program.indexdb in the
tests tree).
e.g.
"location" is a String filename of the directory which will contain
the index files:
public void testIndexGenbankDNA() throws Exception
{
File [] files = getDBFiles(new String [] { "part1.gb",
"part2.gb" });
IndexTools.indexGenbank("test", new File(location),
files, SeqIOConstants.DNA);
SequenceDBLite db = new FlatSequenceDB(location, "genbank");
Sequence seq1 = db.getSequence("A16SRRNA");
assertEquals(1497, seq1.length());
Sequence seq2 = db.getSequence("A16STM112");
assertEquals(1346, seq2.length());
Sequence seq3 = db.getSequence("A16STM146");
assertEquals(1352, seq3.length());
Sequence seq4 = db.getSequence("AY080928");
assertEquals(557, seq4.length());
Sequence seq5 = db.getSequence("AY080929");
assertEquals(556, seq5.length());
Sequence seq6 = db.getSequence("AY080930");
assertEquals(557, seq6.length());
}
hth
Keith
--
- Keith James <kdj at sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
More information about the Biojava-l
mailing list