[Bioperl-l] A perl regex query
Chris Fields
cjfields at uiuc.edu
Tue Sep 18 15:24:52 UTC 2007
On Sep 18, 2007, at 8:26 AM, Roy Chaudhuri wrote:
>> My actual problem is a bit more complicated.
>> It is not just one string, nut lakhs of them, they are actually
>> names of
>> chemical compounds.
>>
>> THe problem is there are 2 different data sources, I need to match
>> the
>> compond names between them, but the problem is though the compound
>> may
>> be the same in the two, they use different naming formats for them.
>
> Unless you can define in simple and precise terms exactly which
> parts of
> the string you need then there is no way that you will be able to
> code a
> solution in Perl.
>
> Maybe you could look for a database that contains the synonyms for
> each
> molecule? A quick Google finds ChEBI (http://www.ebi.ac.uk/chebi),
> which
> is available to download as flat files.
>
> Roy.
> --
> Dr. Roy Chaudhuri
> Department of Veterinary Medicine
> University of Cambridge, U.K.
D'oh! Roy beat me to it; that's what I was going to suggest. I
agree; don't trust simple word munging to always get you the correct
answer in this case, it's just too complicated to try and catch every
case.
ChEBI is a good choice; Stefan's suggestion of OpenBabel is also a
good one. I would also try not to reinvent the wheel; there may be
some modules available via CPAN which do what you need, such as these:
http://search.cpan.org/search?query=chem&mode=module
or this:
http://search.cpan.org/~ghutchis/Chemistry-OpenBabel-1.2.0/
chris
More information about the Bioperl-l
mailing list