[Bioperl-l] Sequence matching problem!
James Smith
js5 at sanger.ac.uk
Fri Feb 23 11:34:37 UTC 2007
On Fri, 23 Feb 2007, Albert Vilella wrote:
> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.
Try...
my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
my $regexp = '('.join('-*?',split//,$seq).')';
if( $gapped_seq =~ /$regexp/ ) {
print "Match is $1\n";
} else {
print "No match\n";
}
(not sure on the efficiency if $seq is long tho')
James
>
> Cheers,
More information about the Bioperl-l
mailing list