[Bioperl-l] struggling with Bio::Index::GenBank
Brian Osborne
brian_osborne at cognia.com
Thu Feb 6 07:57:49 EST 2003
Neal,
>> I hope this is not too basic a question, but I am struggling.
This is our fault, I apologize. What the module documentation doesn't
mention are the environmental variables you need to set. Take a look at the
excerpt below from biodatabases.pod. The example scripts bpfetch.pl and
bpindex.pl are currently in scripts/index. I will add to the Bio::Index*
documentation. I'm assuming:
setenv BIOPERL_INDEX_TYPE DB_File
as Jason just mentioned.
=head1 SETTING UP BIOPERL INDICES (Bio::Index::*)
If you want to use Bioperl indicies of Fasta, EMBL/SwissProt .dat files,
SwissPfam, GenBank, or Blast files then the bpfetch.pl and
bpindex.pl scripts are great ways to start off (and also reading the
scripts shows you how to use the Bioperl indexing stuff). bpfetch.pl and
bpindex.pl coordinate using two environment variables
BIOPERL_INDEX - directory where the indices are kept
BIOPERL_INDEX_TYPE - type of DBM file to use for the index
The basic way of indexing a database, once BIOPERL_INDEX has been
set up, is to go
bpindex.pl <index-name> <filenames as full path>
eg, for Fasta files
bpindex.pl est /nfs/somewhere/fastafiles/est*.fa
Or, for EMBL/Swissprot files
bpindex.pl -fmt=EMBL swiss /nfs/somewhere/swiss/swissprot.dat
To retrieve sequences from the index go
bpfetch.pl <index-name>:<id>
eg,
bpfetch.pl est:AA01234
or
bpfetch.pl swiss:VAV_HUMAN
bpfetch.pl also has other options to connect to Genbank across the network.
=head1 CHECKLIST
mkdir /nfs/datadisk/bioperlindex/
or any other directory
setenv BIOPERL_INDEX /nfs/datadisk/bioperlindex/
in .cshrc or .tcshrc (or B<set> and B<export> in bash and its .bashrc)
go
bpindex.pl swissprot /nfs/datadisk/swiss/swissprot.dat
etc. You are now ready to use bpfetch.pl. See L<Bio::Index::Fasta>,
L<Bio::Index::GenBank>, L<Bio::Index::Blast>, L<Bio::Index::EMBL>,
L<Bio::Index::SwissPfam>, and L<Bio::Index::Swissprot> for more.
Flat file indexing of Fasta files is also provided by Bio::DB::Fasta,
please see L<Bio::DB::Fasta> for more information - this module provides
some functionality absent from Bio::Index::Fasta.
Brian O.
-----Original Message-----
From: bioperl-l-bounces at bioperl.org [mailto:bioperl-l-bounces at bioperl.org]On
Behalf Of Neil Saunders
Sent: Thursday, February 06, 2003 12:13 AM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] struggling with Bio::Index::GenBank
I hope this is not too basic a question, but I am struggling. I have
downloaded the gbbct subset of GenBank and want to index it. I run
BioPerl 1.2 on Debian/Linux and have all the dependencies installed
(Berkeley DB, the DB_File module and most Debian packages related to
Berkeley dbm (e.g. libdb3-dev and libberkeleydb-perl).
I'm just using the code example that comes with Bio::Index::GenBank and
my file is /usr/local/databases/gbbct.seq, so I run:
make_index.pl /usr/local/databases/gbbct.seq
but persistently get the error:
------------- EXCEPTION -------------
MSG: Can't open 'DB_File' dbm file '/usr/local/databases/gbbct.seq' :
File exists
STACK Bio::Index::Abstract::open_dbm
/usr/local/share/perl/5.6.1/Bio/Index/Abstract.pm:389
STACK Bio::Index::Abstract::new
/usr/local/share/perl/5.6.1/Bio/Index/Abstract.pm:150
STACK Bio::Index::AbstractSeq::new
/usr/local/share/perl/5.6.1/Bio/Index/AbstractSeq.pm:91
STACK toplevel ./index_gb.pl:9
As I'm not that clear on the workings of the index module, I find this
confusing. Am I obviously doing something wrong or missing something?
Neil
--
School of Biotechnology and Biomolecular Sciences,
The University of New South Wales,
Sydney 2052,
Australia
http://psychro.bioinformatics.unsw.edu.au/neil/index.php
_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list