[Bioperl-l] K-mer generating script
James Estill
jestill at plantbio.uga.edu
Sat Dec 20 00:07:11 UTC 2008
SeqIO works great for this. I've used something like the following. This is part of a larger program, so some of this not relevant to what you need ...
========================
$fasta_in = "your_file.fasta";
$k = 3;
my $in_seq_num = 0;
my $inseq = Bio::SeqIO->new( -file => "<$fasta_in",
-format => 'fasta');
while (my $seq = $inseq->next_seq) {
$in_seq_num++;
if ($in_seq_num == 2) {
print "\a";
die "Input file should be a single sequence record\n";
}
# Calculate base cooridate data
my $seq_len = $seq->length();
my $max_start = $seq->length() - $k;
# Print some summary data
print STDERR "\n==============================\n" if $verbose;
print STDERR "SEQ LEN: $seq_len\n" if $verbose;
print STDERR "MAX START: $max_start\n" if $verbose;
print STDERR "==============================\n" if $verbose;
# CREATE FASTA FILE OF ALL K LENGTH OLIGOS
# IN THE INPUT SEQUENCE
print STDERR "Creating oligo fasta file\n" if $verbose;
open (FASTAOUT, ">$temp_fasta") ||
die "Can not open temp fasta file:\n $temp_fasta\n";
for ($i=0; $i<=$max_start; $i++) {
$start_pos = $i + 1;
$end_pos = $start_pos + $k - 1;
my $oligo = $seq->subseq($start_pos, $end_pos);
# Set counts array to zero
$counts[$i] = 0;
print FASTAOUT ">$start_pos\n";
print FASTAOUT "$oligo\n";
}
close (FASTAOUT);
}
-- Jamie Estill
-- jestill at uga.edu
-- http://jestill.myweb.uga.edu
-- http://www.epernicus.com/people/jestill
_____
From: Blanchette, Marco [mailto:MAB at stowers-institute.org]
To: bioperl-l at lists.open-bio.org [mailto:bioperl-l at lists.open-bio.org]
Sent: Fri, 19 Dec 2008 18:25:27 -0500
Subject: [Bioperl-l] K-mer generating script
Dear all,
Does anyone have a little function that I could use to generate all possible k-mer DNA sequences? For instance all possible 3-mer (AAA, AAT, AAC, AAG, etc...). I need something that I could input the value of k and get all possible sequences...
I know that it's a problem that need to use recursive programming but I can't get my brain around the problem.
Many thanks
Marco
--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list