[Bioperl-l] A perl regex query

Tue Sep 18 15:24:52 UTC 2007

On Sep 18, 2007, at 8:26 AM, Roy Chaudhuri wrote:

>> My actual problem is a bit more complicated.
>> It is not just one string, nut lakhs of them, they are actually  
>> names of
>> chemical compounds.
>>
>> THe problem is there are 2 different data sources, I need to match  
>> the
>> compond names between them, but the problem is though the compound  
>> may
>> be the same in the two, they use different naming formats for them.
>
> Unless you can define in simple and precise terms exactly which  
> parts of
> the string you need then there is no way that you will be able to  
> code a
> solution in Perl.
>
> Maybe you could look for a database that contains the synonyms for  
> each
> molecule? A quick Google finds ChEBI (http://www.ebi.ac.uk/chebi),  
> which
> is available to download as flat files.
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Department of Veterinary Medicine
> University of Cambridge, U.K.

D'oh!  Roy beat me to it; that's what I was going to suggest.  I  
agree; don't trust simple word munging to always get you the correct  
answer in this case, it's just too complicated to try and catch every  
case.

ChEBI is a good choice; Stefan's suggestion of OpenBabel is also a  
good one.  I would also try not to reinvent the wheel; there may be  
some modules available via CPAN which do what you need, such as these:

http://search.cpan.org/search?query=chem&mode=module

or this:

http://search.cpan.org/~ghutchis/Chemistry-OpenBabel-1.2.0/

chris