[Bioperl-l] acquiring a local refseq + index
Erik
er at xs4all.nl
Sun Dec 31 00:05:16 UTC 2006
Hi all,
I downloaded the refseq files (.gbff) and want to index the lot with
Bio::DB::Flat.
It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.
I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).
Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?
Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.
Thanks for your help,
Erikjan
-------------
use Bio::DB::Flat;
my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
-directory => $refseq_dir,
-dbname => 'refseq',
-format => 'genbank',
-index => 'bdb',
-write_flag => 1,
);
my @files = getfiles($refseq_dir);
for my $f (@files) {
db->build_index($f);
}
More information about the Bioperl-l
mailing list