[Bioperl-l] Pattern Density
Brian Osborne
osborne1 at optonline.net
Tue Feb 21 18:25:35 UTC 2006
Nick,
Right, BioPerl really can¹t help you with the histogram itself but there are
probably multiple solutions to the problem of iterating over the sequence.
Here¹s one idea, untested, it assumes your sequence is in fasta format:
use strict;
use Bio::DB::Fasta;
use Bio::Tools::SeqWords;
my $db = Bio::DB::Fasta->new('/path/to/fasta/files');
my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
my $start = 0;
my $windowsize = 1000;
my $str = ³ccgg²;
my $len = $obj->length;
my $overlap = 250;
while (1) {
my $end = $start + $windowsize;
last if ( $end > $len);
my $subseq = $obj->subseq($start,$end);
my $count = get_count($str,$subseq);
$start += $overlap;
}
sub get_count {
my ($str,$subseq) = @_;
my $seqobj = Bio::Seq->new(-seq => $subseq);
my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);
my $ref = $seq_word->count_overlap_words(length($str));
$ref->{$str};
}
Note this skips the very last window, debugging needed.
Brian O.
On 2/21/06 12:24 PM, "staffa" <staffa at niehs.nih.gov> wrote:
> I am more concerned about the reading and analysis of the sequence than actual
> plotting of the histogram, but anything you can offer will be appreciated.
More information about the Bioperl-l
mailing list