[Bioperl-l] Sequence matching problem!

Heikki Lehvaslaiho heikki at sanbi.ac.za
Fri Feb 23 08:25:39 UTC 2007


Kurt,

There are  few things in your code to note:

- regexp /C*T/ matches any T preceded by zero or more Cs,
  not what you meant
- $- and $+ are among the "expensive" perl functions worth 
  not using unless you have to. Using them once in your 
  code slows execution down considerable. There is always 
  an other way.
- Keep in mind what you want to use the match positions for: 
  Human readable locations usually start counting with 1 but
  perl code uses 0 as the first location. The code below assumes
  you want to print the locations out.

Study my example code below.

Yours,
	-Heikki

###################################################################
#!/usr/bin/perl
$seq = "GATCAAT";
#$pattern=  'C*T';
$pattern=  'C.*T';

while ($seq =~ m/($pattern)/gi) {

    $match = $1;
    $end = pos($seq);
    $start = $end - length($match) +1;

    print "$match : $start - $end\n";
}

###################################################################


On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> Hi every1..
> I m facing a great deal of problem in simple pattern matching between
> sequence & a pattern ..Program shod be designed such a way that it shod be
> able do two things 1) normal matching...For eg: GATCAAT....if TC is
> entered... output shod be 2...2) matching using spl character..In same
> example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> problem..output I m gettin as 1 instead of 3...Code is really simple!
>
> #!/usr/bin/perl
> $alphabet = "GATCAAT";
> $pattern=  "C*T ";
>
> $alphabet =~ /($pattern)/i;
>
> print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
>
> ====================
> OUTPUT!
> The entire C*T match began at 1 and ended at 2
> ====================
>
> but the o/p shod be 3????
> & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> 'CAAT'...????
>
> Well..Its not compulsion to use regex....But I find it quite simple..can
> there be n e other method??
>
> Thanx in advance!
> Kurt!



-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list