[Bioperl-l] BerkeleyDB

Fri Jan 25 18:00:22 UTC 2019

On Fri, 25 Jan 2019 16:13:52 +0000
Peter Cock <p.j.a.cock at googlemail.com> wrote:

> That's a good question - I don't know the BioPerl answer,
> but am interested from the Biopython side of things.
> 
> When I created Biopython's SeqIO (first included in
> Biopython 1.43 from 2007) it was heavily influenced by
> BioPerl's SeqIO:
> 
> https://bioperl.org/howtos/SeqIO_HOWTO
> https://biopython.org/wiki/SeqIO
> 
> The older Biopython framework it replaced (using a regular
> expression based system called Martel/Mindy) had indexing,
> e.g. see the Biopython 1.30 release notes from 2004.
> 
> It took a bit longer to add indexing to Biopython's SeqIO.
> I added in-memory indexing (using a dict or hash Perl
> terminology) in Biopython 1.52 (2009), and then SQLite
> support was added in Biopython 1.57 (2011). And yes, a
> key point of this was to build an index once, and reuse it.
> 
> I did look at BerkeleyDB for this, but concluded that
> SQLite was a more portable and practical choice - it
> was usually included with a standard Python install.

Way back when, I seem to remember some information about DBM::Deep
possibly being put on top of BerkeleyDB.  The man page for DBM::Deep
mentions BDB, but not in the context of the work is finished.  The code
lives at Github, and very little seems to  have been done in the last 2
years.

Gord

> Regards,
> 
> Peter
> 
> On Fri, Jan 25, 2019 at 3:18 PM shalu sharma
> <sharmashalu.bio at gmail.com> wrote:
> >
> > Hey everyone,
> >      So I am using this BerkeleyDB to make a huge database (tree
> > method). I use it to pull out matching ids (its working fine) from
> > multiple datasets. here are few lines of the code:
> >
> > use strict ;
> >
> > use BerkeleyDB ;
> >
> > use Bio::SeqIO;
> >
> > my $filename = "tree" ;
> >
> > unlink $filename ;
> >
> > my %h ;
> >
> > tie %h, 'BerkeleyDB::Btree',
> >
> >                 -Filename   => $filename,
> >
> >                 -Flags      => DB_CREATE,
> >
> >     or die "Cannot open $filename: $!\n" ;
> >
> >
> >  # Add a key/value pair to the file
> >
> > open(IN,"$ARGV[0]");                    # adding values
> >
> > while(<IN>){
> >
> >      my $line = $_;
> >
> >      chomp($line);
> >
> >      my @f = split('\t',$line);
> >
> >      my $id = $f[0];my $val = $f[1];$id =~ s/^\s+//;$id =~ s/\s+$//;
> >
> >      $val =~ s/^\s+//;$val =~ s/\s+$//;
> >
> >      $h{$id} = $val;
> >
> > ----
> > ----
> > My question is that: It makes a huge tree file. Is it possible to
> > re-use that tree file again instead of making it again and again.
> > My query datasets changes but not that database.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/bioperl-l  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>