[Bioperl-l] A pattern problem in Perl

Peter Wilkinson pwilk at videotron.ca
Thu May 1 11:39:25 EDT 2003



Here ya go,

(?:^.*?=\s)(\d+)(?:\/.*?\()(\d+)

This will do the following:

(?:^.*?=\s) matches (but ignores) 'Identities = '
(\d+) matches the numerator of the fraction  PUTS IT INTO $1
(?:\/.*?\() matches ('/135 ('
(\d+) matches the percentage PUTS IT INTO $2

Here is another,

(\d+)(?:\s\()(\d+)


I like the first one because it handles the line in its entirety. Its 
stricter and easy to read. Non-capturing parts I don't want are clear, and 
the parts I want are clear.



If you wanted to have some fun and get the rest of the data, then

(?:^.*?=\s)(\d+)(?:\/)(\d+)(?:.*?\()(\d+)(?:.*?\s)(\d+)(?:\/)(\d+)(?:.*?\()(\d+)

you get:

$1 = 124 # num
$2 = 135 # denom
$3 = 91  # percent
$4 = 2   # num
$5 = 135 # denom
$6 = 1   # percent





Peter W


At 11:12 AM 01/05/2003 +0800, you wrote:
>Hello,
>      I encountered a problem. I try to grap the number from txt files
>(bl2seqs report)
>Those lines read as A:"Identities = 124/135 (91%), Gaps = 2/135 (1%)" or
>just B:"Identities = 124/135 (91%)". Both types coexists.
>I wrote the pattern matching script as "$judgecont=~m/Identities = (.*)\/.*?
>\((.*)%\)/;"
>My purpose is to grap the $1--(the numerator of fraction) and $2 (the
>percentage of Identities not Gaps)
>However according to the greed principle, I always gain the "1"(percentage
>of gaps) as in A situation.
>Any suggestion or guide?  Thank you very much!
>                       Regards
>                                                Darson 2003/05/01
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l


-------------------------------------
Peter Wilkinson
Bioinformatics Consultant

-------------------------------------  




More information about the Bioperl-l mailing list