[Bioperl-l] Pattern Density

Tue Feb 21 18:25:35 UTC 2006

Nick,

Right, BioPerl really can¹t help you with the histogram itself but there are
probably multiple solutions to the problem of iterating over the sequence.
Here¹s one idea, untested, it assumes your sequence is in fasta format:

use strict;
use Bio::DB::Fasta;
use Bio::Tools::SeqWords;

my $db  = Bio::DB::Fasta->new('/path/to/fasta/files');
my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
my $start = 0;
my $windowsize = 1000;
my $str = ³ccgg²;
my $len = $obj->length;
my $overlap = 250;

while (1) {
    my $end = $start + $windowsize;
    last if ( $end > $len);
    my $subseq  = $obj->subseq($start,$end);
    my $count = get_count($str,$subseq);
    $start += $overlap;
}

sub get_count {
    my ($str,$subseq) = @_;
    my $seqobj = Bio::Seq->new(-seq => $subseq);
    my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);
    my $ref = $seq_word->count_overlap_words(length($str));
    $ref->{$str};
}

Note this skips the very last window, debugging needed.

Brian O.

On 2/21/06 12:24 PM, "staffa" <staffa at niehs.nih.gov> wrote:

> I am more concerned about the reading and analysis of the sequence than actual
> plotting of the histogram, but anything you can offer will be appreciated.