[Bioperl-l] K-mer generating script

Dave Messina David.Messina at sbc.su.se
Sat Dec 20 01:11:00 UTC 2008


Hi Marco,

Here's some code to generate and print all possible nmers. I'm really just
using the module Math::Combinatorics to do all the dirty work here, so
probably won't be as fast as if you wrote a custom recursive function as you
suggest. But gets the job done anyway.
See also Bio::Tools::SeqWords and Bio::Tools::SeqStats for related goodies.

Dave


-------------- example code --------------
#!/usr/local/bin/perl

use strict;
use warnings;
use Math::Combinatorics;

# do all codons (3-mers) as an example
generate_possible_kmers(3);

=head2 generate_possible_kmers

 Title   : generate_possible_kamers
 Usage   : my $possible_perms = generate_possible_kmers()
 Function: create and print the list of possible DNA kmers
 Returns : none
 Args    : n - the length of the desired 'mer'

=cut

sub generate_possible_kmers {
my ($n) = @_;
my $alphabet = [ qw( A C G T ) ];
my $words_per_row = 10;
my $i=0;

my $o = Math::Combinatorics->new( count=>$n, data=>$alphabet,
frequency=>[$n,$n,$n,$n] );
while ( my @x = $o->next_multiset ) {
my $p = Math::Combinatorics->new( data=>\@x , frequency=>[map{1} @x] );
while ( my @y = $p->next_string ) {
print join('', @y), ' ';
$i++;
if (($i % $words_per_row) == 0) { print "\n"; }
}
}
}

----------------- end code -----------------



More information about the Bioperl-l mailing list