[Bioperl-l] Output a subset of FASTA data from a single large file
Sendu Bala
sb at mrc-dunn.cam.ac.uk
Mon Jun 12 08:21:31 UTC 2006
Chris Fields wrote:
> There's information in the HOWTOs:
>
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
>
> http://www.bioperl.org/wiki/HOWTO:OBDA
>
> Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO
> ('fasta' format I/O) and this is what I got as output:
>
>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>
>
> i.e. an empty sequence, which is what I guessed might happen
[snip]
As you later discovered, that was an Outlook problem. Just to make this
thread relevant to bioperl, the bioperl solution is:
use Bio::SeqIO;
use Bio::Index::Fasta;
my $inx = Bio::Index::Fasta->new(-write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index(shift);
my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT);
my $wanted_ids_file = shift;
open(IDS, $wanted_ids_file);
while (<IDS>) {
chomp;
my $seq = $inx->fetch($_);
$out->write_seq($seq);
}
sub get_id {
my $line = shift;
$line =~ /^>probe:\S+?:(\S+?):/;
$1;
}
It works for me on the sample sequence given by the OP.
More information about the Bioperl-l
mailing list