[Bioperl-l] A perl regex query

Shameer Khadar shameer at ncbs.res.in
Tue Sep 18 14:57:55 UTC 2007


I used this module for my simple chemoinformatics tasks,

http://www.perlmol.org/ - PerlMol - Perl Modules for Molecular Chemistry
Please explore, you may find something useful.

-- 



> James Smith wrote:
>>
>> Neeti,
>>
>> This isn't really a bioperl query - but I will try and explain a simple
>> solution...
>>
>> warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
>>
>> sub simplify {
>>   local $_ = "-$_[0]-";
>>         ## Quick hack add -'s at start and end! as always match
>> "-string-"
>>   s/-(
>>     Cyclic | # The prefix "cyclic"
>>     \d+    | # a single number between two "-"s
>>     \d+,\d+| # number,number between two "-"s
>>     \w       # a single letter between two "-"s
>>   )(?=-)//ixg;  ## case-insensitive, commented, multiple matches!
>>         ## 0-width +ve lookahead assertion - so can match
>>         ## multiple consecutive -x- constructions in same regexp!
>>   s/-//g;
>>         ## remove remaining "-"s from string...
>> }
>>
>> Not sure what other test strings you may want - but most should be
>> able to
>> fit in the () brackets in the first regexp of simplify
>>
>> James
> Along the same line
>
> # some test for most of the removals below
> my $string = "Alpha-Cyclic-2,3-bi-sphos-1,2,5-pho-D-beta-glycerate";
> my @ra_bad_terms = (  '-?(D|R|S)-',
>                       '-?([aA]lpha|[bB]eta|[gG]amma)-',
>                       '-?([cC]is|[tT]rans)-',
>                       '-?[cC]yclic-',
>                     # '-?\d+(,\d+)+-',   # uncomment to remove numbers,
> too
>                       '(?<!\d)-' );          # '-' following number
> print "$string\n";
> foreach ( @ra_bad_terms ){
>
>   eval { $string =~ s/$_//g; };
>   print "$_:$string\n";   # for feedback only
> }
> #$string =~ s/<@ra_bad_terms>//g;
>
> print lc($string),"\n";
>
>
> --
> Benno Pütz
> Statistische Genetik
> Max-Planck-Institut f. Psychiatrie            Tel.: +49-89-30622-222
> Kraepelinstr. 10                              Fax : +49-89-30622-601
> 80804 München, Germany
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in




More information about the Bioperl-l mailing list