[Bioperl-l] A perl regex query

James Smith js5 at sanger.ac.uk
Tue Sep 18 15:37:57 UTC 2007



On Tue, 18 Sep 2007, Spiros Denaxas wrote:

> On 9/18/07, Benno Puetz <puetz at mpipsykl.mpg.de> wrote:
> James Smith wrote:
> >
> > Neeti,
> >
> > This isn't really a bioperl query - but I will try and explain a simple
> > solution...
> >
> > warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
> >
> > sub simplify {
> >   local $_ = "-$_[0]-";
> >         ## Quick hack add -'s at start and end! as always match
> > "-string-"
> >   s/-(
> >     Cyclic | # The prefix "cyclic"
> >     \d+    | # a single number between two "-"s
> >     \d+,\d+| # number,number between two "-"s
> >     \w       # a single letter between two "-"s
> >   )(?=-)//ixg;  ## case-insensitive, commented, multiple matches!
> >         ## 0-width +ve lookahead assertion - so can match
> >         ## multiple consecutive -x- constructions in same regexp!
> >   s/-//g;
> >         ## remove remaining "-"s from string...
> > }
> >
> > Not sure what other test strings you may want - but most should be
> > able to
> > fit in the () brackets in the first regexp of simplify
> >
> > James
> Along the same line
>

But the point is you don't need to loop over things....


Updated regexp...


sub simplify {
   local $_ = "-$_[0]-";             # Add '-' at start and end!
   s{-(
     [cC]yclic                     | # The prefix "cyclic"
     [aA]lpha | [bB]eta | [gG]amma | # Alpha/beta/gamma
     [tT]rans | [cC]is             | # Trans/cis
     [DRS]                         | # Single letter "D","R" or "S"
#    \d+(,\d+)*                   | # list of 1 or more "," separated nos
   )(?=-)}{}xg;                      # No. list currently commented out!
   s/-//g;                           # remove all "-"
   s/([^\d,])([\d,])/\1-\2/g;        # re-introduce "-" between number/
   s/([\d,])([^\d,])/\1-\2/g;        #  comma and letters
   s/--/-/g;                         # remove duplicate "-" signs..
   return $_;
}




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list