[Bioperl-l] struggling with Bio::Index::GenBank

Thu Feb 6 07:57:49 EST 2003

Neal,

>> I hope this is not too basic a question, but I am struggling.

This is our fault, I apologize. What the module documentation doesn't
mention are the environmental variables you need to set. Take a look at the
excerpt below from biodatabases.pod. The example scripts bpfetch.pl and
bpindex.pl are currently in scripts/index. I will add to the Bio::Index*
documentation. I'm assuming:

setenv BIOPERL_INDEX_TYPE DB_File

as Jason just mentioned.

=head1 SETTING UP BIOPERL INDICES (Bio::Index::*)

If you want to use Bioperl indicies of Fasta, EMBL/SwissProt .dat files,
SwissPfam, GenBank, or Blast files then the bpfetch.pl and
bpindex.pl scripts are great ways to start off (and also reading the
scripts shows you how to use the Bioperl indexing stuff). bpfetch.pl and
bpindex.pl coordinate using two environment variables

  BIOPERL_INDEX - directory where the indices are kept

  BIOPERL_INDEX_TYPE - type of DBM file to use for the index

The basic way of indexing a database, once BIOPERL_INDEX has been
set up, is to go

  bpindex.pl <index-name> <filenames as full path>

eg, for Fasta files

  bpindex.pl est /nfs/somewhere/fastafiles/est*.fa

Or, for EMBL/Swissprot files

  bpindex.pl -fmt=EMBL swiss /nfs/somewhere/swiss/swissprot.dat

To retrieve sequences from the index go

  bpfetch.pl <index-name>:<id>

eg,

  bpfetch.pl est:AA01234

or

  bpfetch.pl swiss:VAV_HUMAN

bpfetch.pl also has other options to connect to Genbank across the network.

=head1 CHECKLIST

   mkdir /nfs/datadisk/bioperlindex/

or any other directory

   setenv BIOPERL_INDEX /nfs/datadisk/bioperlindex/

in .cshrc or .tcshrc (or B<set> and B<export> in bash and its .bashrc)

go

   bpindex.pl swissprot /nfs/datadisk/swiss/swissprot.dat

etc. You are now ready to use bpfetch.pl. See L<Bio::Index::Fasta>,
L<Bio::Index::GenBank>, L<Bio::Index::Blast>, L<Bio::Index::EMBL>,
L<Bio::Index::SwissPfam>, and L<Bio::Index::Swissprot> for more.

Flat file indexing of Fasta files is also provided by Bio::DB::Fasta,
please see L<Bio::DB::Fasta> for more information - this module provides
some functionality absent from Bio::Index::Fasta.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at bioperl.org [mailto:bioperl-l-bounces at bioperl.org]On
Behalf Of Neil Saunders
Sent: Thursday, February 06, 2003 12:13 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] struggling with Bio::Index::GenBank

I hope this is not too basic a question, but I am struggling.  I have
downloaded the gbbct subset of GenBank and want to index it.  I run
BioPerl 1.2 on Debian/Linux and have all the dependencies installed
(Berkeley DB, the DB_File module and most Debian packages related to
Berkeley dbm (e.g. libdb3-dev and libberkeleydb-perl).

I'm just using the code example that comes with Bio::Index::GenBank and
my file is /usr/local/databases/gbbct.seq, so I run:

make_index.pl /usr/local/databases/gbbct.seq

but persistently get the error:

------------- EXCEPTION  -------------
MSG: Can't open 'DB_File' dbm file '/usr/local/databases/gbbct.seq' :
File exists
STACK Bio::Index::Abstract::open_dbm
/usr/local/share/perl/5.6.1/Bio/Index/Abstract.pm:389
STACK Bio::Index::Abstract::new
/usr/local/share/perl/5.6.1/Bio/Index/Abstract.pm:150
STACK Bio::Index::AbstractSeq::new
/usr/local/share/perl/5.6.1/Bio/Index/AbstractSeq.pm:91
STACK toplevel ./index_gb.pl:9

As I'm not that clear on the workings of the index module, I find this
confusing.  Am I obviously doing something wrong or missing something?

Neil
--
 School of Biotechnology and Biomolecular Sciences,
 The University of New South Wales,
 Sydney 2052,
 Australia

http://psychro.bioinformatics.unsw.edu.au/neil/index.php
_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l