[Bioperl-l] Shuffling sequences
Jason Stajich
jason at cgt.duhs.duke.edu
Tue May 25 15:16:13 EDT 2004
(untested code, but should work)
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
use Bio::PrimarySeq;
my $in = new Bio::SeqIO(-format => 'fasta', -file => 'fastafile.fa');
my $seq = $in->next_seq;
my @seq_as_array = split(//,$seq->seq);
my @randomseqs;
for ( 1..1000 ) {
my @temp = @seq_as_array;
&fy_shuffle(\@temp);
push @randomseqs, join('', at temp);
}
my $out = new Bio::SeqIO(-format => 'fasta', -file =>">shuffled.fa");
my $i = 1;
for my $s ( @randomseqs ) {
my $newseq = new Bio::PrimarySeq(-display_id => "rand.$i",
-seq => $s);
$out->write_seq($s);
}
# randomizer (Fisher-Yates shuffle)
sub fy_shuffle {
my $array = shift;
my $i;
for( $i = @$array; $i--; ) {
my $j = int rand($i+1);
next if $i==$j;
@$array[$i,$j] = @$array[$j,$i];
}
}
The randomizer code is from the perl cookbook. You could probably make it
faster by avoiding the string -> array -> string part and using substr
method to operate directly on the string.
If someone wants to do that and add it to SeqUtils or somewhere in Bioperl
would be good.
-jason
On Tue, 25 May 2004, KHOUEIRY pierre wrote:
> Hi all,
> I'm searching for the bioperl Method that shuffle/randomize a given
> protein sequence. I need to shuffle my fasta sequence 1000 times to make
> a statistics test on.
> thanks in advance
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list