Bioperl: expert at reg. expressions: some patterns, thanks

dfield@biomail.ucsd.edu dfield@biomail.ucsd.edu
Thu, 8 Oct 1998 14:46:30 -0700 (PDT)


Thanks to everyone for the TREMENDOUS response I got after posting the
following message.  

>Could I solicit the expertise of anyone highly (and creatively) skilled  in
>constructing regular expressions?  I have some patterns that I can't solve
>the regular expressions for and I could use some good ideas....
>dawn

I've had so many offers for help from generous or curious people looking
for 'puzzles' and it's been requested that I post some patterns to the list
to see how different people provide a solution. I hope no one minds that I
posted these to the list, and special thanks to Andrew Dalke and Gustavo
Glusman for the solutions I have gotten so far...

I study repetetive DNA so I'm very interested in patterns.  I've written
programs to look for these patterns before but not in perl and I'm just
learning the power of reg expressions.


so for example, I need to match:
1.  pattern:
>how to find QAQAQAQAQAQA in a protein sequences -- it's like finding an iteration of "QA", but
>can I make a regular expression that doesn't need a motif like "QA"
>specified?

offered solution

Try  /(..){2,}/  or  /(..)$1+/

$1 will tell you what the dipeptide was. length($&)/2 will tell you the
number of copies.

2.  pattern:

I understand (R|H){6,} finds all combinations of tracts of R and H of
lengths 6
>or greater.  But if I want only "combination" tracts that are made of a combination of BOTH R and H, how do I write an RE to exclude tracts of ONLY R (R)n  and ONLY H (H)n.

3. can I find a tract of Q (of minimum length N) followed by no more than X
amino acids before another tract of Q (of minimum length N) is found again?
 For example, to find:

AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQ

4. how do I find tracts of an identical amino acid that are flanked at
either end with the same amino acid...
Good at: HTTTTTTTTTTH or  TGGGGGGGGGGGT


 
Dawn










** ** ** ** ** ** ** ** ** ** ** ** ** **
***************************************
Dawn Field
University of California, San Diego
Department of Biology
Rm #3165, Muir Biology
9500 Gilman Drive
La Jolla, CA 92093-0116

e-mail dfield@ucsd.edu
Tel  (619) 534-5474
Fax  (619) 534-7108
***************************************
** ** ** ** ** ** ** ** ** ** ** ** ** **








=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================