[Biojava-dev] CONECT records, issue 330, and the refactoring of bonds.

Matt Larson larsonmattr at gmail.com
Fri Dec 2 03:39:23 UTC 2016


Jose & Spencer,

I would be concerned if there are information that an author provides in
the CONECT and/or LINK records that will now be lost?

Maybe these records should still be parsed and used to create Bond
instances (like SSBOND records) if the atoms are present in the structure.
One suggestion is to have a field/enum in the Bond instance that would
record what the source was for a bond, whether it is
(conect,link,ssbond,standard peptide/nucleotide bonds, or chemical
components).  Then it would be possible to use or filter types of Bond's
depending on the user's needs.

I am not yet sure what the impact is for dropping CONECT, perhaps what
could be lost is unusual connections such as isopeptide bonds or
crosslinking between polymers and ligands.

Best regards,
  Matt



On Thu, Dec 1, 2016 at 10:34 AM, Spencer Bliven <spencer.bliven at gmail.com>
wrote:

> I guess that if you have a novel ligand in a non-deposited file then it
> wouldn't have a chemical component and so CONECT would be the only place to
> find those bonds. Is that an important enough use case to warrant the
> development effort?
>
> Matt, your comments re #330 are well taken. +1 to a high-level try-catch
> to handle IndexOutOfBounds, NullPointer, and other nonspecific parsing
> errors.
>
> -Spencer
>
> On Thu, Dec 1, 2016 at 12:24 AM, Jose Duarte <jose.duarte at rcsb.org> wrote:
>
>> In the move towards version 5.0 (still in development), bonds were
>> unified by using the Bond class to represent them. The Bond objects are a
>> better representation and provide easier access since they are referenced
>> directly from Atom objects.
>>
>> It seems that CONECT records in current HEAD are indeed not used at all
>> (they are parsed but nothing is done with them). In any case Bonds are
>> created for these cases: SSBOND records, peptide bonds (inferred from chain
>> geometry) and intra-residue bonds (by getting information from chemical
>> component dictionary). The question here would be: are the CONECT records
>> adding anything on top of that? What kind of other bonds do we miss that
>> CONECT records have?
>>
>> If the CONECT records are providing extra info that we don't have
>> elsewhere then this would be an issue in 5.0-SNAPSHOT that would need to be
>> solved. If they don't provide extra info, then we'd better get rid of all
>> code dealing with CONECT records to be sure we don't have unnecessary
>> parsing problems.
>>
>> Jose
>>
>>
>>
>> On Wed, Nov 30, 2016 at 1:44 PM, Matt Larson <larsonmattr at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I was looking back at issue 330 (short lines when parsing PDB) but there
>>> have been some large API changes ~ 4.2.x to move BioJava to more resemble
>>> mmCIF structure formats.  These API changes dropped LINK (and possibly
>>> CONECT records?).
>>>
>>> The question I have is whether CONECT records should now be used to
>>> create Bond(s), like what happens for SSBonds?  Is BioJava only going to
>>> use Bond instances to describe bond-like interactions?  CONECT records are
>>> still being parsed but they may no longer be stored or accessible from a
>>> Structure, since they are not creating Bond instances and getConnections()
>>> was deprecated.
>>>
>>> It would be helpful to still provide CONECT information from PDB files
>>> to describe bonding between atoms especially for ligands.  If not, then
>>> CONECT records should not be parsed.
>>>
>>> For issue 330:
>>> Parsing CONECT can cause string out-of-bounds exceptions when they have
>>> only 2 atoms are present.  Besides implementing line length checks when
>>> parsing CONECTs, adding a try/catch block for string out of bound
>>> exceptions around the parsePDBFile(..) handler blocks that skip invalid
>>> lines and log warnings rather than breaking parsing would make the parser
>>> more robust.
>>>
>>> --
>>> Matt Larson, PhD
>>> Madison, WI  53705 U.S.A.
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>>
>
>


-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20161201/ecf60e92/attachment.html>


More information about the biojava-dev mailing list