[Bioperl-l] BerkeleyDB
Gordon Haverland
ghaverla at materialisations.com
Fri Jan 25 18:00:22 UTC 2019
On Fri, 25 Jan 2019 16:13:52 +0000
Peter Cock <p.j.a.cock at googlemail.com> wrote:
> That's a good question - I don't know the BioPerl answer,
> but am interested from the Biopython side of things.
>
> When I created Biopython's SeqIO (first included in
> Biopython 1.43 from 2007) it was heavily influenced by
> BioPerl's SeqIO:
>
> https://bioperl.org/howtos/SeqIO_HOWTO
> https://biopython.org/wiki/SeqIO
>
> The older Biopython framework it replaced (using a regular
> expression based system called Martel/Mindy) had indexing,
> e.g. see the Biopython 1.30 release notes from 2004.
>
> It took a bit longer to add indexing to Biopython's SeqIO.
> I added in-memory indexing (using a dict or hash Perl
> terminology) in Biopython 1.52 (2009), and then SQLite
> support was added in Biopython 1.57 (2011). And yes, a
> key point of this was to build an index once, and reuse it.
>
> I did look at BerkeleyDB for this, but concluded that
> SQLite was a more portable and practical choice - it
> was usually included with a standard Python install.
Way back when, I seem to remember some information about DBM::Deep
possibly being put on top of BerkeleyDB. The man page for DBM::Deep
mentions BDB, but not in the context of the work is finished. The code
lives at Github, and very little seems to have been done in the last 2
years.
Gord
> Regards,
>
> Peter
>
> On Fri, Jan 25, 2019 at 3:18 PM shalu sharma
> <sharmashalu.bio at gmail.com> wrote:
> >
> > Hey everyone,
> > So I am using this BerkeleyDB to make a huge database (tree
> > method). I use it to pull out matching ids (its working fine) from
> > multiple datasets. here are few lines of the code:
> >
> > use strict ;
> >
> > use BerkeleyDB ;
> >
> > use Bio::SeqIO;
> >
> > my $filename = "tree" ;
> >
> > unlink $filename ;
> >
> > my %h ;
> >
> > tie %h, 'BerkeleyDB::Btree',
> >
> > -Filename => $filename,
> >
> > -Flags => DB_CREATE,
> >
> > or die "Cannot open $filename: $!\n" ;
> >
> >
> > # Add a key/value pair to the file
> >
> > open(IN,"$ARGV[0]"); # adding values
> >
> > while(<IN>){
> >
> > my $line = $_;
> >
> > chomp($line);
> >
> > my @f = split('\t',$line);
> >
> > my $id = $f[0];my $val = $f[1];$id =~ s/^\s+//;$id =~ s/\s+$//;
> >
> > $val =~ s/^\s+//;$val =~ s/\s+$//;
> >
> > $h{$id} = $val;
> >
> > ----
> > ----
> > My question is that: It makes a huge tree file. Is it possible to
> > re-use that tree file again instead of making it again and again.
> > My query datasets changes but not that database.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list