Bioperl: expert at reg. expressions: some patterns, thanks
James Freeman
jfreeman@darwin.bu.edu
Thu, 8 Oct 1998 18:32:23 -0400 (EDT)
> Thanks to everyone for the TREMENDOUS response I got after posting the
> following message.
>
> >Could I solicit the expertise of anyone highly (and creatively) skilled in
> >constructing regular expressions? I have some patterns that I can't solve
> >the regular expressions for and I could use some good ideas....
> >dawn
>
> I've had so many offers for help from generous or curious people looking
> for 'puzzles' and it's been requested that I post some patterns to the list
> to see how different people provide a solution. I hope no one minds that I
> posted these to the list, and special thanks to Andrew Dalke and Gustavo
> Glusman for the solutions I have gotten so far...
>
> I study repetetive DNA so I'm very interested in patterns. I've written
> programs to look for these patterns before but not in perl and I'm just
> learning the power of reg expressions.
>
>
> so for example, I need to match:
> 1. pattern:
> >how to find QAQAQAQAQAQA in a protein sequences -- it's like finding an iteration of "QA", but
> >can I make a regular expression that doesn't need a motif like "QA"
> >specified?
>
> offered solution
>
> Try /(..){2,}/ or /(..)$1+/
>
> $1 will tell you what the dipeptide was. length($&)/2 will tell you the
> number of copies.
Also try:
/(.)(.)(\1\2){2,}/
with the same length formula. This is probably inferior to the above
regular expression.
>
> 2. pattern:
>
> I understand (R|H){6,} finds all combinations of tracts of R and H of
> lengths 6
> >or greater. But if I want only "combination" tracts that are made of a combination of BOTH R and H, how do I write an RE to exclude tracts of ONLY R (R)n and ONLY H (H)n.
In your if statement put the following:
if( $foo =~ /(R|H){5,}/ && $& =~ /.+RH.+|.+HR.+/) {
}
>
> 3. can I find a tract of Q (of minimum length N) followed by no more than X
> amino acids before another tract of Q (of minimum length N) is found again?
> For example, to find:
>
> AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQ
if($foo =~ /Q{5,}[^Q]{16}Q{5,}/) {
}
>
> 4. how do I find tracts of an identical amino acid that are flanked at
> either end with the same amino acid...
> Good at: HTTTTTTTTTTH or TGGGGGGGGGGGT
>
if($foo =~ /(.)(.)\2+\1/ && $1 !~ /$2/) {
}
I hope this helps,
Jim Freeman
>
>
> Dawn
>
>
>
>
>
>
>
>
>
>
> ** ** ** ** ** ** ** ** ** ** ** ** ** **
> ***************************************
> Dawn Field
> University of California, San Diego
> Department of Biology
> Rm #3165, Muir Biology
> 9500 Gilman Drive
> La Jolla, CA 92093-0116
>
> e-mail dfield@ucsd.edu
> Tel (619) 534-5474
> Fax (619) 534-7108
> ***************************************
> ** ** ** ** ** ** ** ** ** ** ** ** ** **
>
>
>
>
>
>
>
>
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
>
--
Jim Freeman P: mammon@tiac.net W: jfreeman@darwin.bu.edu
Programmer/Analyst at Bio-Molecular Engineering Center at BU.
Enjoy yourself, its later than you think.
http://www.tiac.net/users/mammon/index.html
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================