[Biojava-dev] mmCIF parser development

Tue Oct 27 22:24:37 UTC 2015

@Andreas, will do.

@Peter, we have been using the createAtomBonds property in FileParsingParameters. We observed that the ligand bonds are referencing the “new” chain ID from the mmCIF file rather than the “legacy” chain ID (also included in the record); we currently rely on the legacy ID. Bond objects for the hemes in hemoglobin (4HHB) are clearly being created. Currently, we are looking into whether BioJava uses different chain IDs during different parsing workflows or whether our existing glue code for interacting with BioJava contains incorrect chain references.

From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
Sent: Tuesday, October 27, 2015 5:16 PM
To: Steve Darnell
Cc: biojava-dev at mailman.open-bio.org
Subject: Re: [Biojava-dev] mmCIF parser development

Hi Steve,

I'd say these are excellent feature requests for the current mmCIF parsing framework.

We definitely want to enable all of the categories mentioned by you. The data categories need to be added in either the SimpleMmCifParser/Consumer, and the ChemCompConsumer.

If you want to provide a patch for parsing ligand information, sites, and ssbonds, that would be welcome. I'd say these should go all into the SimpleMMCIFConsumer.

Having said that, we need to double check that some of your requests are not already implemented. E.g. there is already the BondMaker class that assigns bonds based on mmCif definitions. (check the shouldCreateAtomBonds flag in FileParsingParameters)

Do you want to open up (multiple) tickets on GitHub for the requested data categories, then we can discuss them in detail there?

Thanks,

Andreas

On Tue, Oct 27, 2015 at 1:11 PM, Steve Darnell <darnells at dnastar.com<mailto:darnells at dnastar.com>> wrote:
Greetings,

Our group uses the BioJava mmCIF parser to handle reading mmCIF files to provide structural information.  The parsing and writing of mmCIF provide some of the important data entries but we would like to extend these to provide more details.  We are looking in particular to add the ability to parse ligand bond information, the _struct_site records for sites, and _struct_conn for disulfide bonds.  We are hoping to contribute any implementations for parsing back to the BioJava code base.

For the ligand bond information, we could parse connectivity via the Chemical Component library since the bond information is not present within mmCIF as CONECT was for PDB.

Is there planned development to get the equivalent bond information for ligands in the SimpleMMCIFParser as is present for CONECT records for PDB parsing?  If we were to implement parsing of ligand information, sites, and ssbonds, would it be best to add this SimpleMMCIFConsumer or as another consumer?

Is this discussion best suited for the github issue tracker? If so, we will start a discussion there.

--
-----------------------------------------------------------------------
Dr. Andreas Prlic
RCSB PDB Protein Data Bank
Technical & Scientific Team Lead
University of California, San Diego

Editor Software Section
PLOS Computational Biology

BioJava Project Lead
-----------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20151027/9cf19e02/attachment.html>