[Bioperl-l] A perl regex query
Shameer Khadar
shameer at ncbs.res.in
Tue Sep 18 14:57:55 UTC 2007
I used this module for my simple chemoinformatics tasks,
http://www.perlmol.org/ - PerlMol - Perl Modules for Molecular Chemistry
Please explore, you may find something useful.
--
> James Smith wrote:
>>
>> Neeti,
>>
>> This isn't really a bioperl query - but I will try and explain a simple
>> solution...
>>
>> warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
>>
>> sub simplify {
>> local $_ = "-$_[0]-";
>> ## Quick hack add -'s at start and end! as always match
>> "-string-"
>> s/-(
>> Cyclic | # The prefix "cyclic"
>> \d+ | # a single number between two "-"s
>> \d+,\d+| # number,number between two "-"s
>> \w # a single letter between two "-"s
>> )(?=-)//ixg; ## case-insensitive, commented, multiple matches!
>> ## 0-width +ve lookahead assertion - so can match
>> ## multiple consecutive -x- constructions in same regexp!
>> s/-//g;
>> ## remove remaining "-"s from string...
>> }
>>
>> Not sure what other test strings you may want - but most should be
>> able to
>> fit in the () brackets in the first regexp of simplify
>>
>> James
> Along the same line
>
> # some test for most of the removals below
> my $string = "Alpha-Cyclic-2,3-bi-sphos-1,2,5-pho-D-beta-glycerate";
> my @ra_bad_terms = ( '-?(D|R|S)-',
> '-?([aA]lpha|[bB]eta|[gG]amma)-',
> '-?([cC]is|[tT]rans)-',
> '-?[cC]yclic-',
> # '-?\d+(,\d+)+-', # uncomment to remove numbers,
> too
> '(?<!\d)-' ); # '-' following number
> print "$string\n";
> foreach ( @ra_bad_terms ){
>
> eval { $string =~ s/$_//g; };
> print "$_:$string\n"; # for feedback only
> }
> #$string =~ s/<@ra_bad_terms>//g;
>
> print lc($string),"\n";
>
>
> --
> Benno Pütz
> Statistische Genetik
> Max-Planck-Institut f. Psychiatrie Tel.: +49-89-30622-222
> Kraepelinstr. 10 Fax : +49-89-30622-601
> 80804 München, Germany
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in
More information about the Bioperl-l
mailing list