[Bioperl-l] BerkeleyDB

Peter Cock p.j.a.cock at googlemail.com
Fri Jan 25 16:13:52 UTC 2019


That's a good question - I don't know the BioPerl answer,
but am interested from the Biopython side of things.

When I created Biopython's SeqIO (first included in
Biopython 1.43 from 2007) it was heavily influenced by
BioPerl's SeqIO:

https://bioperl.org/howtos/SeqIO_HOWTO
https://biopython.org/wiki/SeqIO

The older Biopython framework it replaced (using a regular
expression based system called Martel/Mindy) had indexing,
e.g. see the Biopython 1.30 release notes from 2004.

It took a bit longer to add indexing to Biopython's SeqIO.
I added in-memory indexing (using a dict or hash Perl
terminology) in Biopython 1.52 (2009), and then SQLite
support was added in Biopython 1.57 (2011). And yes, a
key point of this was to build an index once, and reuse it.

I did look at BerkeleyDB for this, but concluded that
SQLite was a more portable and practical choice - it
was usually included with a standard Python install.

Regards,

Peter

On Fri, Jan 25, 2019 at 3:18 PM shalu sharma <sharmashalu.bio at gmail.com> wrote:
>
> Hey everyone,
>      So I am using this BerkeleyDB to make a huge database (tree method).
> I use it to pull out matching ids (its working fine) from multiple datasets.
>  here are few lines of the code:
>
> use strict ;
>
> use BerkeleyDB ;
>
> use Bio::SeqIO;
>
> my $filename = "tree" ;
>
> unlink $filename ;
>
> my %h ;
>
> tie %h, 'BerkeleyDB::Btree',
>
>                 -Filename   => $filename,
>
>                 -Flags      => DB_CREATE,
>
>     or die "Cannot open $filename: $!\n" ;
>
>
>  # Add a key/value pair to the file
>
> open(IN,"$ARGV[0]");                    # adding values
>
> while(<IN>){
>
>      my $line = $_;
>
>      chomp($line);
>
>      my @f = split('\t',$line);
>
>      my $id = $f[0];my $val = $f[1];$id =~ s/^\s+//;$id =~ s/\s+$//;
>
>      $val =~ s/^\s+//;$val =~ s/\s+$//;
>
>      $h{$id} = $val;
>
> ----
> ----
> My question is that: It makes a huge tree file. Is it possible to re-use that tree file again instead of making it again and again. My query datasets changes but not that database.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


More information about the Bioperl-l mailing list