[Bioperl-l] regular expression help!

Peter Robinson Peter.Robinson at t-online.de
Mon Jan 17 14:21:44 EST 2005


Just a suggestion, but I don't think regular expressions are the best
way to do this. You might want to take a look at some of the programs
at www.emboss.org, which can find repeats, inverted repeats /
palindromes in DNA sequences. The EMBOSS programs are open-source, easy
to use and quite useful, although the EMBOSS group is unfortunately now
having difficulties with funding.

-peter

On Mon, 2005-01-17 at 17:17, Guojun Yang wrote:
> Thanks for everybody's comments, the only thing I am interested in is a regular expression to recognize the pattern (it should not be confined to certain sequences as have suggested by some). For example: in tttaatatcaaAGCATgggaaaggatat....atatcctttcccGCATacatataccata, the regex should recognize AGCATgggaaaggatat....atatcctttcccGCAT. The problem is not the direct repeat AGCAT, but how to match the atatcctttccc with the gggaaaggatat. I guess there must be a way to do it. I tried the following and obtained weird results:
> /.*(\S+)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S).*(??{convert(\11);})(??{convert(\10);})(??{convert(\9);})(??{convert(\8);})(??{convert(\7);})(??{convert(\6);})(??{convert(\5);})(??{convert(\4);})(??{convert(\3);})(??{convert(\2);})\1.*/i
> ...
> 
> sub convert{
> my $return=$_[0];
> $return =~ tr/ATCG/TAGC/;
> $return =reverse($return);
> return $return;
> }
> 
> Can anybody give me a hint on the -e switch when using perl script inside a regex?
> 
> Yang
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Willy West <corenth at gmail.com>
> To: Jan.Aerts at wur.nl, bioperl-l at portal.open-bio.org
> Sent: Sun, 16 Jan 2005 09:53:55 -0500
> Subject: Re: [Bioperl-l] regular expression help!
> 
> 
> > oops- i'd forgotten to "reply to all" with this... i apologize.
> > 
> > 
> > On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan <Jan.Aerts at wur.nl> wrote:
> > > The problem is (or I might miss something here), that he wants to _test_ a
> > regex. It's not possible to write something like
> > > $_ =~ /(.*)(.*)foo(\2)(.*)/e
> > > I think...
> > > 
> > > jan.
> > 
> > now i'm trying to do this with the test regex and am not successful :(
> >   this is an interesting problem and i really would love to find a
> > way..
> > 
> > one solution would be to explode the whole thing in another
> > subroutine... but if it's
> > not  what you want, i'm not yet sure how to do it.
> > 
> > good challenge though.....
> > 
> > :)
> > 
> > > 
> > > 
> > > -----Original Message-----
> > > From:   Willy West [mailto:corenth at gmail.com]
> > > Sent:   Sun 16-Jan-05 00:09
> > > To:     Aerts, Jan
> > > Cc:
> > > Subject:        Re: [Bioperl-l] regular expression help!
> > > On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan <Jan.Aerts at wur.nl> wrote:
> > > > You're right... Should have looked at the actual expression.
> > > > Idea: is it possible in this case to call subroutines from within a regex
> > and evaluating them using the 'e' switch?
> > > 
> > > if i recall::
> > > 
> > > sub foo {
> > >            return 'hello genome';
> > > }
> > > 
> > > $data = "ih ho hum bababa";
> > > 
> > > $data =~ s/ih/foo/e; #one way to do it.
> > > 
> > > print "$data\n";
> > > 
> > > seems to work..
> > 
> > 
> > -- 
> > Willy
> > http://www.hackswell.com/corenth
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson at t-online.de
peter.robinson at charite.de
http://www.charite.de/ch/medgen/robinson/



More information about the Bioperl-l mailing list