[Biojava-l] PDBFileParser question using PDBID 470D

Sat Dec 4 17:46:08 UTC 2010

Hi Steve,

> I've been using biojava to gather sequence data from structure files for an internal project.  My intent was to test the limitations of my work (hence files similar to 470D), but came across this behavior in biojava.

ok

> It is not critical to obtain this particular mapping since it can be derived from the atom records.  However, I didn't understand why the SEQRES list would be empty and was looking for clarification.  Is it because the chain is RNA and the empty list prevents the unsupported alignment of RNA records?

When working with PDB files the list gets built up after the
alignment. Since RNA alignment is not supported by the parser, the
list can't get created...

In principle mmCif files contain the info how to join SEQRES and ATOM
groups correctly and no alignment is needed.  I will take a look
again, how this works in this case...

Andreas

> -----Original Message-----
> From: andreas.prlic at gmail.com [mailto:andreas.prlic at gmail.com] On Behalf Of Andreas Prlic
> Sent: Monday, November 29, 2010 6:36 PM
> To: Steve Darnell
> Cc: biojava-l at lists.open-bio.org
> Subject: Re: [Biojava-l] PDBFileParser question using PDBID 470D
>
> Hi Steve,
>
> as you already are saying, this is an "exotic" sequence, in the sense
> that this is an RNA. The alignments of the SEQRES records for RNA
> currently is not supported as of yet. Can you explain a bit more what
> you are doing and why you need this mapping in this case?
>
> Thanks,
> Andreas
>
> On Mon, Nov 29, 2010 at 12:51 PM, Steve Darnell <darnells at dnastar.com> wrote:
>> Greetings,
>>
>> After parsing PDBID 470D with biojava-3.0-alpha5, Chain A returns an
>> empty SEQRES sequence (Chain.getSeqResSequence) and empty SEQRES group
>> list (Chain.getSeqResGroups) but the one-letter ATOM sequence is
>> properly translated and the ATOM group list contains the appropriate
>> number of groups (LoadChemCompInfo set to true).
>>
>> This is an exotic sequence, but my expectation is that the SEQRES group
>> list would have members in it (and one-letter sequence translated if
>> LoadChemCompInfo is true).  Am I mistaken and the current behavior is
>> the intended result?
>>
>> Best regards,
>> Steve Darnell
>>
>> --
>> SEQRES records exist in 470D:
>>
>> SEQRES   1 A   12  C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48
>>
>> SEQRES   1 B   12  C43 G48 C43 G48 A44 A44 U36 U36 C43 G48 C43 G48
>>
>>
>>
>> Sample println output (ln 1 record type, ln 2 get${TYPE}Sequence, ln 3
>> get${TYPE}Groups):
>>
>> SEQRES
>> ''
>> []
>>
>> ATOM
>> 'CGCGAAUUCGCG'
>> [PDB: C43 1 trueatoms: 21, PDB: G48 2 trueatoms: 27, PDB: C43 3
>> trueatoms: 24, ...]
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
> --
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
>

-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------