[BioPython] SD/MDL file parser
Andrew Dalke
dalke at dalkescientific.com
Mon Sep 26 09:13:27 EDT 2005
Hi Noel,
> I've just been through the documentation and site-packages on my
> computer, and I cannot find a parser for SD (or MDL) files. This is the
> most common file format for chemical structures in databases of
> chemicals (as used by pharmaceutical companies, for example).
>
> Did I miss this parser? I know that Andrew Dalke (through PyDaylight)
> has an interest in chemistry, so I was expecting to find this parser...
As Jerome mentioned, frowns includes an MDL parser.
>>> from frowns import MDL
>>> filename =
"/usr/local/openeye/python/examples/oechem/examples/drugs.sdf"
>>> for mol, error, text in MDL.sdin(open(filename)):
... print mol.cansmiles(), mol.fields
...
C(c1c(OC(=O)C)cccc1)(=O)O {'Color': 'red', 'Energy': '1'}
c12C(=O)NC(=Nc1[n](cn2)COCCO)N {'Color': 'blue', 'Energy': '2'}
c1(c(cccc1)CC=C)OCC(O)CNC(C)C {'Color': 'green', 'Energy': '3'}
C1(C(N(c2ccccc2)N(C=1C)C)=O)N(C)C {'Energy': '4.5'}
c1(OCC(O)CNC(C)C)ccc(cc1)CC(=O)N {'Color': 'purple', 'Energy': '-3.5'}
C1(c2c(N(C(N1C)=O)C)nc[n]2C)=O {'Color': 'black', 'Energy': '0'}
>>>
It converts the connection table data into a Frowns data
structure. It should keep the chemistry the same as what's
in the file because the frowns.Molecule doesn't do any
perception, but doing something like cansmiles() will likely
change things.
If you have SD fields with repeats of the same key then
there will be a problem, because the parser expects that
the data can be stored in a dictionary. OEChem has a
dictionary-like data structure which also allows list-like
iteration for this case.
If I had the time (okay, and if someone was willing to pay
for me to do this :) I would probably use something like my
MultiDict class instead.
Your email says you're in Cambridge, eh? I'll be there
in a couple of weeks for the EuroMUG conference, staying
there for a week to also visit EBI and Sanger.
Andrew
dalke at dalkescientific.com
More information about the BioPython
mailing list