[Bioperl-l] Indexing nr database
Ross KK Leung
ross at cuhk.edu.hk
Tue Sep 7 09:18:16 UTC 2010
The reason is that I have to retrieve the specific information of the
matched sequences, e.g. extract the 64th amino acid of the top matched
sequence. Is there any way to achieve that?
-----Original Message-----
From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch]
Sent: Tuesday, September 07, 2010 5:09 PM
To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
Subject: Re: [Bioperl-l] Indexing nr database
Hi
why don't you use the pre-indexed BLAST files from NCBI:
ftp://ftp.ncbi.nih.gov/blast/db/
you can use them to fetch individual sequences by gi number or accession
with the tool "blastdbcmd" from blast+ binaries:
ftp://ftp.ncbi.nih.gov/blast/executables/blast+/
regards, Hans
On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the
index
> file is> 1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
> use strict;
>
> use Bio::DB::Flat::BinarySearch;
>
>
>
> (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
@ARGV;
>
>
>
> # use single quotes so you don't have to write
>
> # regular expressions like "gi\\|(\\d+)"
>
> #my $primary_pattern = '^>(\S+)';
>
> #if ($fullHeader == 1) {
>
> my $primary_pattern = '^>(.+)';
>
> #}
>
> my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
> #print "$string\n";
>
>
>
> # one or more patterns stored in a hash:
>
> my $secondary_patterns = {GI => 'gi\|(\d+)'};
>
>
>
> my $db = Bio::DB::Flat::BinarySearch->new(
>
> -directory => $baseDir,
>
> -dbname => $dbName,
>
> -write_flag => 1,
>
> -primary_pattern => $primary_pattern,
>
> -primary_namespace => 'ACC',
>
> -secondary_patterns => $secondary_patterns,
>
> -verbose => 1,
>
> -format => 'fasta' );
>
>
>
> $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list