[Bioperl-l] Shuffling sequences

Jason Stajich jason at cgt.duhs.duke.edu
Tue May 25 15:16:13 EDT 2004


(untested code, but should work)

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Bio::PrimarySeq;

my $in = new Bio::SeqIO(-format => 'fasta', -file => 'fastafile.fa');
my $seq = $in->next_seq;
my @seq_as_array = split(//,$seq->seq);
my @randomseqs;
for ( 1..1000 ) {
  my @temp = @seq_as_array;
  &fy_shuffle(\@temp);
  push @randomseqs, join('', at temp);
}

my $out = new Bio::SeqIO(-format => 'fasta', -file =>">shuffled.fa");
my $i = 1;
for my $s ( @randomseqs ) {
  my $newseq = new Bio::PrimarySeq(-display_id => "rand.$i",
	                           -seq        => $s);
  $out->write_seq($s);
}

# randomizer (Fisher-Yates shuffle)
sub fy_shuffle {
    my $array = shift;
    my $i;
    for( $i = @$array; $i--; ) {
        my $j = int rand($i+1);
        next if $i==$j;
        @$array[$i,$j] = @$array[$j,$i];
    }
}

The randomizer code is from the perl cookbook.  You could probably make it
faster by avoiding the string -> array -> string part and using substr
method to operate directly on the string.
If someone wants to do that and add it to SeqUtils or somewhere in Bioperl
would be good.

-jason
On Tue, 25 May 2004, KHOUEIRY pierre wrote:

> Hi all,
> I'm searching for the bioperl Method that shuffle/randomize  a given
> protein sequence. I need to shuffle my fasta sequence 1000 times to make
> a statistics test on.
> thanks in advance
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list