[Biojava-l] PDBFileParser and identifying atoms in ligands

Tue May 18 02:04:00 UTC 2010

Hi Andy,

- There are a few things to discuss about the 193D example. This is a
special case. If you investigate the details it appears that the NBU
is actually covalently bound to chain C and not a free Ligand.  It is
one of the cases where it is difficult to draw the line between what
is a ligand and what is a chemically modified peptide (oh joy)

> * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method
> Chain.getAtomGroups(GroupType.HETATM) provides neither.

- The best way to determine ligands is using the Chemical Component
Dictionary.  Currently the BioJava PDB parser is not using this, yet.
It contains a lot of additional info for modified residues and ligands
 (e.g. http://www.rcsb.org/pdb/files/ligand/NBU.cif to get the data
for the NBU group  ) . I will add support for this to the parser in
the next couple of days.  ( e.g. the group type can be used to
distinguish chemically modified residues from other ligands). I did
some initial work on this already in the past, but it is not hooked up
with the PDB parser at the present.

- Another way to determine Ligands is to investigate the various bonds
within the protein. BioJava currently can't do that either, but we
would like to add this at some point in the future...

- Just to repeat myself: TER is not a good criteria to determine
Ligands. I have seen cases in the past where authors used it to
indicate an interruption in the main chain, since they could not
experimentally observe the position of a loop region. The main chain
did continue after the TER...

> * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups().
> * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups().
> * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results.

- All these points are actually caused by the same issue: the attempt
to match up ATOM and SEQRES sequences. The chains contain mostly
hetatoms which are represented as "X" in the alignment. This makes it
difficult to align them correctly.  I will investigate if using the
chem. comp. dictionary  one_letter_code or mmcif group
parent->one_letter_code will make the alignment more useful here...

Andreas

On Mon, May 17, 2010 at 3:46 PM, Andy Thomas-Cramer
<thomascramera at dnastar.com> wrote:
>
> Hi Andreas.
>
> I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below.
>
> I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only.
>
> For example, 193D includes these two identical sequences:
>
> SEQRES   1 C    5  HQU DSN ALA NCY CPC
> SEQRES   1 D    5  HQU DSN ALA NCY CPC
>
> And chain C has an associated ligand, NBU, which is not part of the sequence.
>
> Let:
> * "BioJava SEQRES" = chain.getSeqResGroups()
> * "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM):
>
> Then I get the following results with the new code:
>
> Chain: C
>  Actual ligands:  NBU
>  Actual SEQRES:   HQU DSN ALA NCY CPC
>  BioJava SEQRES:  HQU DSN ALA CPC NBU <--
>                               ^^^
>  BioJava HETATM:  HQU DSN NCY CPC NBU
>
> Chain: D
>   Actual ligands:  None
>   Actual SEQRES:   HQU DSN ALA NCY CPC
>   BioJava SEQRES:  HQU DSN ALA NCY CPC
>   BioJava HETATM:  HQU DSN NCY CPC
>
> I'm looking for the "Actual" lines above.
>
> Issues:
> * In chain C only, BioJava omits the actual residue NCY in getSeqResGroups().
> * In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups().
> * Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results.
> * There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither.
>
>
> -----Original Message-----
> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
> Sent: Friday, May 07, 2010 5:28 PM
> To: Andy Thomas-Cramer
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands
>
> Hi Andy,
>
> I see what you intend to do.  If you want just the HETATOM groups, you
> can request them with
>
> Chain.getAtomGroups(GroupType.HETATM);
>
> or just amino acids groups you can requrest them  with
>
> Chain.getAtomGroups(GroupType.AMINOACID);
>
> same would work for getSeqresGroups(...) as well, but then your two
> examples are quite specific:
>
> 193D is an antibiotic/DNA complex.
> 7EST chain I is a
> TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE
>
> Hetatoms are represented as Xs during the sequence alignments. I can
> easily fix the "failing" alignment in this case, by ignoring the
> wrongly aligned Hetatom Xs  (patch just committed to SVN...). Not sure
> if it makes any biological difference in your two examples.
>
> Andreas
>
>
>
>
> On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer
> <thomascramera at dnastar.com> wrote:
>>
>> Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands.
>>
>> Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences.
>>
>> Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both.
>>
>> Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment?
>>
>>
>> -----Original Message-----
>> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
>> Sent: Thursday, May 06, 2010 3:52 PM
>> To: Andy Thomas-Cramer
>> Cc: biojava-l at lists.open-bio.org
>> Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands
>>
>> Hi Andy,
>>
>> You don't need to process TERs to build up the representation of a
>> structure.  The BioJava data model will work fine even if the file
>> does not contain any amino acids. (e.g.  check 2KQO )
>>
>> Ligands will get represented as Hetatom groups in the datamodel.
>> Check the Hetatom or Group javadocs for how to access their atoms.
>>
>> For your last question: Check out the Chain.getAtomGroups() and
>> Chain.getSeqResGroups() methods...
>>
>> If it does not work the way you expect for a particular PDB ID, please
>> let me know the ID, so I can take a look at the details.
>>
>> Andreas
>>
>>
>> On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer
>> <thomascramera at dnastar.com> wrote:
>>> >From a PDB file, I can identify which atoms are in ligands, and which
>>> are in residues in the chain. The chain atoms end with the TER record.
>>>
>>>
>>>
>>> >From the BioJava API, I can distinguish as well -- if it's an amino
>>> sequence and the automatic alignment between SEQRES and ATOM sequences
>>> is successful.
>>>
>>>
>>>
>>> Is there a way through the API to identify atoms in ligands, when the
>>> chain is not an amino sequence or alignment fails? It looks like the TER
>>> record is ignored by PDBFileParser.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>
>