[Biojava-l] Biojava Post translational modifications

Thu Sep 9 22:24:35 UTC 2010

Hi Jay,

Is this from the latest svn-trunk? Sounds like this has been created
using the biojava 1.7. There were several improvements over the last
months regarding chemically modified groups .... In the current code
base if you set FileParsingParameters.setLoadChemCompInfo(true), you
will get the chemically correct representation for all groups... I
suggest trying out the code below (using a checkout from biojava-svn
...)

Andreas

 public void basicLoad(String pdbId){
      try {

         PDBFileReader reader = new PDBFileReader();

         // the path to the local PDB installation
         reader.setPath("/tmp");

         // are all files in one directory, or are the files split,
         // as on the PDB ftp servers?
         reader.setPdbDirectorySplit(true);

         // should a missing PDB id be fetched automatically from the
FTP servers?
         reader.setAutoFetch(true);

         // configure the parameters of file parsing

         FileParsingParameters params = new FileParsingParameters();

         // should the ATOM and SEQRES residues be aligned when
creating the internal data model?
         params.setAlignSeqRes(true);

         // should secondary structure get parsed from the file
         params.setParseSecStruc(false);

         // This tells the code to fetch the chemical definitions for
all  groups
         params.setLoadChemCompInfo(true);

         reader.setFileParsingParameters(params);

         Structure structure = reader.getStructureById(pdbId);

         System.out.println(structure);

         for (Chain c: structure.getChains()){
            System.out.println("Chain " + c.getName() + " details:");
            System.out.println("Atom ligands: " + c.getAtomLigands());
            System.out.println(c.getSeqResGroups());
         }

      } catch (Exception e){
         e.printStackTrace();
      }

   }

On Thu, Sep 9, 2010 at 3:13 PM, JAX <jayunit100 at gmail.com> wrote:
> Hi Andreas, some of my collaborators could not get post translational
> modifications from pdb files using biojavas structure API.  Do you have any
> thoughts on this?
>
> Jay Vyas
> MMSB
> UCHC
> Begin forwarded message:
>
> From: Patrick Gradie <pgradie at gmail.com>
> Date: September 9, 2010 5:23:10 PM EDT
> To: biotoolkit at googlegroups.com
> Subject: Re: problems with biojava
> Reply-To: biotoolkit at googlegroups.com
>
> The issue that I found with the BioJava PDB utility is as follows:
> BioJava takes a PDB File xxxx.cif.gz and then populates a Structure variable
> in memory that you can pull from.
> You are able to get things like header, dbref, model, chain, residue, and
> atom info. That was good to have, however, I found that when I tried
> searching for motifs I could not find any of the ones that had required
> modifications.
> This is because when biojava would parse (ACE)SKS(MLZ)DRKYTL it would simply
> truncate the (ACE) and (MLZ).  However the important thing here is that MLZ
> is an N-METHYL-LYSINE or a K before modification.
> So in the database would be SKSDRKY (there is no atom data for T or L in the
> example string only sequence information)
> The motif [KR][AST]K[DNQK] would not be found in that truncated sequence
> because the K in the center is required to be in the sequence.
> I am not sure why BioJava would just truncate these modified residues.
> ESPECIALLY because in the pdb file iteself is the following line in every
> single file except around 15 out of the 64k:
>
> loop_
> _entity_poly.entity_id
> _entity_poly.type
> _entity_poly.nstd_linkage
> _entity_poly.nstd_monomer
> _entity_poly.pdbx_seq_one_letter_code
> _entity_poly.pdbx_seq_one_letter_code_can
> _entity_poly.pdbx_strand_id
> 1 'polypeptide(L)' no no
> ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM
> PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN
> TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS
> GPEAPEWYQVELKAFQATQQK
> ;
> ;GAMGYKDNIRHGVCWIYYPDGGSLVGEVNEDGEMTGEKIAYVYPDERTALYGKFIDGEMIEGKLATLMSTEEGRPHFELM
> PGNSVYHFDKSTSSCISTNALLPDPYESERVYVAESLISSAGEGLFSKVAVGPNTVMSFYNGVRITHQEVDSRDWALNGN
> TLSLDEETVIDVPEPYNHVSKYCASLGHKANHSFTPNCIYDMFVHPRFGPIKCIRTLRAVEADEELTVAYGYDHSPPGKS
> GPEAPEWYQVELKAFQATQQK
> ;
> A
> 2 'polypeptide(L)' no yes '(ACE)SKS(MLZ)DRKYTL'
> XSKSKDRKYTL
> B
>
> As you can see above, the sequence XSKSKDRKYTL is given in full.  the ACE is
> turned into an X because it doesn't map to a regular amino acid.  So the PDB
> files hold both the modified and unmodified version of the sequence in this
> special section. Given that information it is possible to create a database
> that motifs can be searched for within.
> BioJava will throw a bunch of errors "WARNING: unknown group name MLZ" for
> residues it doesn't interpret as regular amino acids.
> I am not sure, though, if the BioJava 3 release fixes this problem.
> -Patrick
>
> On Thu, Sep 9, 2010 at 1:47 PM, Jay Vyas <jayunit100 at gmail.com> wrote:
>>
>> Hi guys, does anyone want to tell me about the issues regarding the
>> PDB utilities in BioJava ?  I am interested in knowing what they were
>> ?
>>
>> --
>> Jay Vyas
>> MMSB/UCHC
>
>

-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------