[Bioperl-l] Bio::Index::Fastq - Interface for indexing (multiple) fastq files failure

Wed Apr 7 17:56:04 UTC 2010

On Wed, 2010-04-07 at 18:08 +0100, Peter wrote:
> On Wed, Apr 7, 2010 at 5:56 PM, Chris Fields <cjfields at illinois.edu> wrote:
> >
> > I think we're going with the AnyDBM option, which allows SQLite if
> > requested (via Mark's SQLite_DBM).
> >
> > chris
> 
> Hi Chris,
> 
> Does Mark's SQLite_DBM already have an SQLite schema defined? I'd
> idealy like us to agree something shared with other Bio* libraries (a new
> OBDA standard using SQLite instead of BDB). I was thinking something
> along these lines if we want to support an index for multiple files:
> 
> * meta - table with string key/values (in particular to hold a schema version
> number, plus perhaps the tool which built the index)
> 
> * offsets - table with entry accessions, file number, file offset
> 
> * files - table with filenames, file type (e.g. FASTA), datestamp
> (so we can spot if the index is older than the file and needs to be
> updated), perhaps other things like if the file is compressed (gzip,
> bz2, ...).
> 
> If some kind of shared SQLite index schema (whatever it looks like)
> does seem like a good idea to you guys (BioPerl), should we move
> this discussion over to open-bio-l at lists.open-bio.org?
> 
> Regards,
> 
> Peter

I think this is a good idea for ODBA-based modules, but Bio::Index::*
modules aren't ODBA-compliant (at least that I know of); it's a simple
key-value hash pairing I believe.  Bio::Flat* are ODBA-compliant,
though, so it's worth exploring this.

chris