[Bioperl-l] Re: Motif finding
Boris Lenhard
Boris.Lenhard@cgb.ki.se
20 Feb 2002 19:41:19 +0100
>
> Is there a way of finding motifs in a strand of DNA?
> The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is
> there a perl module to do it?
>
> Thanks.
>
> Desmond
>
Try TFBS at http://forkhead.cgb.ki.se/TFBS/ .
But please wait a few hours, I am uploading the 0.3 release tonight.
It represents DNA patterns using matrices, but has modules for
converting a set of DNA motifs to matrix representation. In your case,
if you have e.g. motif "ACATTAGATTT", you would do
my $patterngen =
TFBS::PatternGen::SimplePFM->new(-sequences=>["ACATTAGATTT"]);
my $frequency_matrix = $patterngen->pattern;
my $weight_matrix = $frequency_matrix->to_PWM;
# suppose you want to scan a sequence in a Bio::Seq object
# called $seqobj, with 80% score threshold
my $binding_site_set = $weight_matrix->search_seq(-seqobj=>$seqobj,
-threshold=>"80%");
# to loop through the $binding_site_set, do
my $iterator = $binding_site_set->iterator;
while (my $binding_site = $iterator->next) {
# do whatever you want with $binding_site;
# $binding_site is a TFBS::Site object,
# which is a subclass of Bio::SeqFeature::Generic
# and has all its functionality
}
There are other ways to go, too.
Cheers,
Boris
#####################################
Boris Lenhard, Ph.D.
Center for Genomics and Bioinformatics
Karolinska Institutet
Berzelius väg 35, B322
171 77 Stockholm, SWEDEN
Phone: +46 (0)8 728 6142
FAX: +46 (0)8 32 48 26
E-mail: Boris.Lenhard@cgb.ki.se
#####################################