[Bioperl-l] A perl regex query
James Smith
js5 at sanger.ac.uk
Tue Sep 18 15:37:57 UTC 2007
On Tue, 18 Sep 2007, Spiros Denaxas wrote:
> On 9/18/07, Benno Puetz <puetz at mpipsykl.mpg.de> wrote:
> James Smith wrote:
> >
> > Neeti,
> >
> > This isn't really a bioperl query - but I will try and explain a simple
> > solution...
> >
> > warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
> >
> > sub simplify {
> > local $_ = "-$_[0]-";
> > ## Quick hack add -'s at start and end! as always match
> > "-string-"
> > s/-(
> > Cyclic | # The prefix "cyclic"
> > \d+ | # a single number between two "-"s
> > \d+,\d+| # number,number between two "-"s
> > \w # a single letter between two "-"s
> > )(?=-)//ixg; ## case-insensitive, commented, multiple matches!
> > ## 0-width +ve lookahead assertion - so can match
> > ## multiple consecutive -x- constructions in same regexp!
> > s/-//g;
> > ## remove remaining "-"s from string...
> > }
> >
> > Not sure what other test strings you may want - but most should be
> > able to
> > fit in the () brackets in the first regexp of simplify
> >
> > James
> Along the same line
>
But the point is you don't need to loop over things....
Updated regexp...
sub simplify {
local $_ = "-$_[0]-"; # Add '-' at start and end!
s{-(
[cC]yclic | # The prefix "cyclic"
[aA]lpha | [bB]eta | [gG]amma | # Alpha/beta/gamma
[tT]rans | [cC]is | # Trans/cis
[DRS] | # Single letter "D","R" or "S"
# \d+(,\d+)* | # list of 1 or more "," separated nos
)(?=-)}{}xg; # No. list currently commented out!
s/-//g; # remove all "-"
s/([^\d,])([\d,])/\1-\2/g; # re-introduce "-" between number/
s/([\d,])([^\d,])/\1-\2/g; # comma and letters
s/--/-/g; # remove duplicate "-" signs..
return $_;
}
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list