[Bioperl-l] fasta file parser
Sendu Bala
bix at sendu.me.uk
Tue Jul 22 12:42:31 UTC 2008
ste.ghi at libero.it wrote:
> Dear all,
> I'm trying to write a script wich, given a file containing a list of
> IDs, parses a big fasta file returning only sequences NOT listed in the list-
> file.
>
> To do so, I first create an array with the IDs to be excluded:
>
> [...]
>
> #Load LIST content in an array; avoids duplicates
> while (my $line = <LIST>) {
>
>
> push(@array1,$line );
>
> foreach my $uniq ( @array1 ){
>
> next if $seen
> { $uniq }++;
>
> push @unique, $uniq;
>
> }
> }
Not sure what you're doing here (probably the cause of your problem?).
But hashes are your friend:
@list = <LIST>;
%unique = map { chomp($_) => 1 } @list;
> then, process the fasta file in
> this way (NOT WORKING).
>
> #Fasta file processing
> my $newSeqFileName = Bio::
> SeqIO->new(-file=> ">>INFILE", -format=>'fasta');
> while (my $query =
> $SeqFileName->next_seq()) {
if (defined $unique{$query->id}) {
> print $query->id." matched
> with $elem listed in $ARGV[1]: skipped!\n";
> next;
> }
else {
> next if $seen2{ $query->id }++;
>
> $newSeqFileName->write_seq($query);
>
> }
>
> }
More information about the Bioperl-l
mailing list