[Bioperl-l] (no subject)
Marcel van Batenburg
marcelvb@nikhef.nl
Fri, 5 Oct 2001 22:52:15 +0200 (MET DST)
Hi,
I've subscribed to this group because I am interested in using bioperl modules.
At this moment, I read an article about motiffs i.e. dna-subsequences
of regulatory genes which are binding sites for proteins that regulates the polymerase
II. These motiffs can be obtained from the gene. I would like to generate
all motiffs for a given gene and determine the number of equal motiffs which occur in a gene.
Switching from bioperl to perl-language, find all substring
for any length from a given string of a length larger than 0, and put them in a hash of
hash of reals (by default) motiffs_genes{gene_string-motiff_string}.
Ok, one can arrive at a solution like the snippet of code below uggghhh.
However it is not optimal in computation time.
Of course one can try to work with references and dismiss the
print statements, but the computation time for each gene will be proportional to
the quadratic of the length of the gene given by lgene*lgene (1+2+3+4+....+lgene).
So what kind of bioperl modules are useful in speeding up the generation
of substrings and subsequently putting them in the aforementioned hash?
With kind regards,
Marcel
#!/global/bin/perl -w
use strict;
my @genes=qw(
TTTTTT
ACGACG
TTACGT
);
my $gene; # gene
my $lgene; # nr of nucleotides = length of string
my $length; # length of string gene
my @motiffs; # array of all motiffs (substrings) of string gene
my $motiff; # one substring
my %motiffs_genes; # depository for multiciplity of ocurrence of pair (gene,motiff)
my $key;
foreach $gene(@genes){
$lgene=length($gene);
for($length=1;$length<=$lgene;$length++){
@motiffs=extract_motiffs_per_gene($gene,$length);
foreach $motiff(@motiffs) {
$key=$gene."-".$motiff;
++$motiffs_genes{$key};
};
# foreach $motiff(@motiffs){
# $key=$gene."-".$motiff;
# print $gene." ".$motiff." multipliciteit=".$motiffs_genes{$key};
# print "\n";
# };
};
};
sub extract_motiffs_per_gene
{
my ($sequence,$motiff_length) = @_; my @rmotiffs;
my $code = 'while (';
$code .= '$sequence =~m/(.{' ;
$code .= "$motiff_length" ;
$code .='})/g){';
$code .= ' push @rmotiffs, $1 ;' ;
$code .= ' pos($sequence) = $#rmotiffs+1;';
$code .= '};';
eval $code ;
return(@rmotiffs);
};