[Bioperl-l] Sequence matching problem!

James Smith js5 at sanger.ac.uk
Fri Feb 23 11:34:37 UTC 2007


On Fri, 23 Feb 2007, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.

Try...

 my $seq = "CGATCAACGAATCGTACGTACTC";
 my $gapped_seq =
   "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

 my $regexp = '('.join('-*?',split//,$seq).')';

 if( $gapped_seq =~ /$regexp/ ) {
   print "Match is $1\n";
 } else {
   print "No match\n";
 }

 (not sure on the efficiency if $seq is long tho')
James

>
> Cheers,



More information about the Bioperl-l mailing list