[Bioperl-l] Output a subset of FASTA data from a single large file
Sendu Bala
sb at mrc-dunn.cam.ac.uk
Mon Jun 12 08:49:49 UTC 2006
Michael Oldham wrote:
> Thanks to everyone for their helpful advice. I think I am getting closer,
> but no cigar quite yet. The script below runs quickly with no errors--but
> the output file is empty. It seems that the problem must lie somewhere in
> the 'while' loop, and I'm sure it's quite obvious to a more experienced
> eye--but not to mine! Any suggestions? Thanks again for your help.
>
> --Mike O.
>
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID.dat.txt';
>
> unless (open(IDFILE, $IDs)) {
> print "Could not open file $IDs!\n";
> }
>
> my $probes = 'HG_U95Av2_probe_fasta.txt';
>
> unless (open(PROBES, $probes)) {
> print "Could not open file $probes!\n";
> }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> chomp @ID;
> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and
> all values=1.
>
> while (<PROBES>) {
> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/;
> if ($idmatch){
> print OUT;
> }
> }
> exit;
Not sure why it would print nothing (are the ids in IDFILE the same case
as the ids in the fasta file, do they only contain word characters?),
but even if it did you would only be printing out the fasta headers and
not the sequences. Doing it the bioperl way gives you more flexibility
in the future; you may want to do something with the sequences after
printing them out, in which case do it in bioperl using Seq objects and
skip the intermediate step of printing them.
More information about the Bioperl-l
mailing list