[Open-bio-l] Status of OBDA and indexed flatfiles?

Peter biopython at maubp.freeserve.co.uk
Mon Aug 31 15:07:28 UTC 2009


On Mon, Aug 31, 2009 at 3:01 PM, Naohisa
GOTO<ngoto at gen-info.osaka-u.ac.jp> wrote:
> Hi Peter,
>
>> Presumably BioPerl still uses these index files? What about the
>> other projects? I know EMBOSS has some indexing system for
>> example but I have no idea how it works internally.
>
> BioRuby still uses them. To gain performance, names and offsets are
> written to temporary files and using external sort program (default
> /usr/bin/sort).

That makes sense. Have you tried this on very large files? e.g.
FASTA with 10 million short reads?

> In BioRuby, flatfile-only solution works fine, but BerkeleyDB indexes
> would be incompatible with other projects, because of confusion in
> the spec, discussed in BioPerl Bugzilla Bug #2337.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2337

Thank you for the link to that bug - I'll need to read that carefully.

Peter



More information about the Open-Bio-l mailing list