[EMBOSS] dbifasta index file format

Peter Rice pmr at ebi.ac.uk
Mon Apr 10 09:05:36 UTC 2006


Graziano P. wrote:
> hello EMBOSS users,
> I have some databases in fasta format (ncbi | format)
> and I want to index them using dbifasta, then I want
> to access the index files using a program that will be
> developed by a computer scientist of my group.
> I need to index the databases by accession number,
> ginumber and description. I have read in the dbifasta
> help info about the structure of the index files when
> the databases were indexed by accession number, but I
> have not found info about the structure of the index
> files when the databases are indexed by description.
> Anyone knows where I can find detailed information
> about the structure of the index files?

Ciao Graziano,

The dbifasta index files use the same format as the Staden package, the old 
EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility.

They were documented in some old Staden documentation and papers.

They are also documented in the EMBOSS distribution under doc/manuals/ in file 
internals-indexing.txt (see attached). I see that this document was written 
before we indexed the descriptions!!!

The description (title) indexing is the same as the accession number indexing. 
The files are called des.hit and des.trg. dbifasta has a -maxindex option to 
limit the size of the longest words indexed (the index files have a value for 
the maximum record length).

We also have a script in the distribution scripts/dbilist.pl which can list 
the contents of the description index (in the database index directory, run it 
as dbilist.pl des)

The new dbxfasta index files are very different. For very large databases we 
recommend dbxfasta. For smaller databases dbifasta is fine and we will 
continue to support it.

Hope that helps. If you need more details, just ask.

regards,

Peter


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: internals-indexing.txt
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20060410/be632ef4/attachment-0001.txt>


More information about the EMBOSS mailing list