[Bioperl-l] A perl regex query
Benno Puetz
puetz at mpipsykl.mpg.de
Tue Sep 18 14:12:47 UTC 2007
James Smith wrote:
>
> Neeti,
>
> This isn't really a bioperl query - but I will try and explain a simple
> solution...
>
> warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
>
> sub simplify {
> local $_ = "-$_[0]-";
> ## Quick hack add -'s at start and end! as always match
> "-string-"
> s/-(
> Cyclic | # The prefix "cyclic"
> \d+ | # a single number between two "-"s
> \d+,\d+| # number,number between two "-"s
> \w # a single letter between two "-"s
> )(?=-)//ixg; ## case-insensitive, commented, multiple matches!
> ## 0-width +ve lookahead assertion - so can match
> ## multiple consecutive -x- constructions in same regexp!
> s/-//g;
> ## remove remaining "-"s from string...
> }
>
> Not sure what other test strings you may want - but most should be
> able to
> fit in the () brackets in the first regexp of simplify
>
> James
Along the same line
# some test for most of the removals below
my $string = "Alpha-Cyclic-2,3-bi-sphos-1,2,5-pho-D-beta-glycerate";
my @ra_bad_terms = ( '-?(D|R|S)-',
'-?([aA]lpha|[bB]eta|[gG]amma)-',
'-?([cC]is|[tT]rans)-',
'-?[cC]yclic-',
# '-?\d+(,\d+)+-', # uncomment to remove numbers, too
'(?<!\d)-' ); # '-' following number
print "$string\n";
foreach ( @ra_bad_terms ){
eval { $string =~ s/$_//g; };
print "$_:$string\n"; # for feedback only
}
#$string =~ s/<@ra_bad_terms>//g;
print lc($string),"\n";
--
Benno Pütz
Statistische Genetik
Max-Planck-Institut f. Psychiatrie Tel.: +49-89-30622-222
Kraepelinstr. 10 Fax : +49-89-30622-601
80804 München, Germany
More information about the Bioperl-l
mailing list