[Biojava-l] PDBFileParser and identifying atoms in ligands

Andy Thomas-Cramer thomascramera at dnastar.com
Mon May 17 22:46:54 UTC 2010


Hi Andreas. 

I tried the new code. Although it allows the alignment to complete, it provides a result different than for an identical sequence without an associated ligand. See results below.

I have tried Chain.getAtomGroups(GroupType.HETATM). However, it provides the set of het atom groups -- which includes both modified residues in the chain and ligands outside the chain. I need either the latter only, or the chain only.

For example, 193D includes these two identical sequences:

SEQRES   1 C    5  HQU DSN ALA NCY CPC                                          
SEQRES   1 D    5  HQU DSN ALA NCY CPC                                          

And chain C has an associated ligand, NBU, which is not part of the sequence. 

Let:
* "BioJava SEQRES" = chain.getSeqResGroups() 
* "BioJava HETATM" = chain.getAtomGroups(GroupType.HETATM):

Then I get the following results with the new code:

Chain: C                       
  Actual ligands:  NBU
  Actual SEQRES:   HQU DSN ALA NCY CPC                                          
  BioJava SEQRES:  HQU DSN ALA CPC NBU <-- 
                               ^^^
  BioJava HETATM:  HQU DSN NCY CPC NBU
                       
Chain: D
   Actual ligands:  None
   Actual SEQRES:   HQU DSN ALA NCY CPC
   BioJava SEQRES:  HQU DSN ALA NCY CPC     
   BioJava HETATM:  HQU DSN NCY CPC     

I'm looking for the "Actual" lines above.

Issues:
* In chain C only, BioJava omits the actual residue NCY in getSeqResGroups().
* In chain C only, BioJava includes the outside-the-sequence ligand NBU in getSeqResGroups().
* Sequences C and D are identical in the PDB file, but BioJava's getSeqResGroups() reports two different results.
* There does not appear to be a way to determine which groups are in the sequence, and which are ligands outside the sequence. The method Chain.getAtomGroups(GroupType.HETATM) provides neither.


-----Original Message-----
From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
Sent: Friday, May 07, 2010 5:28 PM
To: Andy Thomas-Cramer
Cc: biojava-l at lists.open-bio.org
Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands

Hi Andy,

I see what you intend to do.  If you want just the HETATOM groups, you
can request them with

Chain.getAtomGroups(GroupType.HETATM);

or just amino acids groups you can requrest them  with

Chain.getAtomGroups(GroupType.AMINOACID);

same would work for getSeqresGroups(...) as well, but then your two
examples are quite specific:

193D is an antibiotic/DNA complex.
7EST chain I is a
TRIFLUOROACETYL-*L-*LEUCYL-*L-*ALANYL-P-TRIFLUOROMETHYLPHENYLANILIDE

Hetatoms are represented as Xs during the sequence alignments. I can
easily fix the "failing" alignment in this case, by ignoring the
wrongly aligned Hetatom Xs  (patch just committed to SVN...). Not sure
if it makes any biological difference in your two examples.

Andreas




On Fri, May 7, 2010 at 10:55 AM, Andy Thomas-Cramer
<thomascramera at dnastar.com> wrote:
>
> Hetatom groups are also used to represent modified residues in chains. I would like to obtain either the ligand atoms/groups without the sequence, or the sequence atoms/groups without the ligands.
>
> Chain.getSeqResGroups() reliably returns an empty list, when alignment fails and for non-amino sequences.
>
> Examples of the former include 193D (chains C) and 7EST (chain I). Both of these contain HETATMs both as modified residues and as ligands. Alignment fails in both.
>
> Interestingly, 193D's chain D is identical to chain C -- but it's alignment succeeds. One difference is that C has an associated ligand and D does not. Are the ligand atom groups associated with a chain considered during alignment?
>
>
> -----Original Message-----
> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
> Sent: Thursday, May 06, 2010 3:52 PM
> To: Andy Thomas-Cramer
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] PDBFileParser and identifying atoms in ligands
>
> Hi Andy,
>
> You don't need to process TERs to build up the representation of a
> structure.  The BioJava data model will work fine even if the file
> does not contain any amino acids. (e.g.  check 2KQO )
>
> Ligands will get represented as Hetatom groups in the datamodel.
> Check the Hetatom or Group javadocs for how to access their atoms.
>
> For your last question: Check out the Chain.getAtomGroups() and
> Chain.getSeqResGroups() methods...
>
> If it does not work the way you expect for a particular PDB ID, please
> let me know the ID, so I can take a look at the details.
>
> Andreas
>
>
> On Thu, May 6, 2010 at 9:50 AM, Andy Thomas-Cramer
> <thomascramera at dnastar.com> wrote:
>> >From a PDB file, I can identify which atoms are in ligands, and which
>> are in residues in the chain. The chain atoms end with the TER record.
>>
>>
>>
>> >From the BioJava API, I can distinguish as well -- if it's an amino
>> sequence and the automatic alignment between SEQRES and ATOM sequences
>> is successful.
>>
>>
>>
>> Is there a way through the API to identify atoms in ligands, when the
>> chain is not an amino sequence or alignment fails? It looks like the TER
>> record is ignored by PDBFileParser.
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the Biojava-l mailing list