[Bioperl-l] Indexing CDS file

Tue Feb 10 07:28:34 UTC 2009

Hello,I am trying to index cds files. The following script is run on
cds_ann_hum_01_r99.dat obtained from EMBL. It is able to read and print the
ids and sequences, but strangely it is unable to find any of the ids! Could
you please explain the mistake I am committing? I have tried variously but
it doesn't work.

--- <prog> ---

#!/usr/bin/perl -w

        use strict;

        use Bio::SeqIO;
        use Bio::Index::Fasta;

        my $file = "cds100.dat";
        my $idx_file = "cds100.idx";

        unlink($idx_file); # delete the file, if exists

        my $in_seq = Bio::SeqIO->new(   -file => "< $file",
                                        -format => "embl");

        while ( my $seq = $in_seq->next_seq) {
                print $seq->id, "\n";
        }

        my $inx = Bio::Index::Fasta->new(       -filename => $idx_file,
                                                -write_flag => 1);

        $inx->make_index($file); # creates the index file

        print "Done indexing!\n";

        my $out = Bio::SeqIO->new( '-format' => 'fasta');

        $inx = Bio::Index::Fasta->new('-filename' => $idx_file);

        my $query_seq=$inx->fetch('EAL24405SV');

        if ( ! defined($query_seq) ) {
                print "Sequence not found!\n";
                exit 1;
        } else {
                print $query_seq;
        }

exit 0;

--- </prog> ---

Output is:
--- <stdout> ---
EAL24405;
EAL24406;
.
.
.
EAL24407;
EAL24408;
Done indexing!
Sequence not found!
--- </stdout> ---

TIA,
Sviya