[Bioperl-l] Indexing CDS file
Dave Messina
David.Messina at sbc.su.se
Tue Feb 10 12:37:32 UTC 2009
Hi Sviya,
You've almost got it. You need to use Bio::Index::EMBL if you're going to be
indexing EMBL-format files.
I modified your code to do so, and it worked for me (attached below). Please
try it and report back if you have any problems.
One thing: note that there aren't any 'AC' lines in that file you're
indexing, so I saw errors like this:
--------------------- WARNING ---------------------
MSG: For id [EAL24318;] in embl flat file, got no accession number. Storing
id index anyway
---------------------------------------------------
I did see accession-like identifiers on the 'PA' lines, however. Does
anybody know if this is a change in EMBL format that we need to adapt
B::I::EMBL to?
Dave
----------------------------CODE BEGINS-----------------------------
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Bio::Index::EMBL;
my $file = "cds100.dat";
my $idx_file = "cds100.idx";
unlink($idx_file); # delete the file, if exists
my $in_seq = Bio::SeqIO->new( -file => "< $file",
-format => "embl");
while ( my $seq = $in_seq->next_seq) {
print $seq->id, "\n";
}
my $inx = Bio::Index::EMBL->new(-filename => $idx_file,
-write_flag => 'WRITE');
my $retval = $inx->make_index($file,); # creates the index file
print "Done indexing!\n";
my $out = Bio::SeqIO->new( '-format' => 'fasta',
'-fh' => \*STDOUT);
my $query_seq = $inx->fetch('EAL24309;');
if ( ! defined($query_seq) ) {
print "Sequence not found!\n";
exit 1;
} else {
$out->write_seq($query_seq);
}
exit 0;
-------------------------CODE ENDS--------------------------------
More information about the Bioperl-l
mailing list