[Biojava-l] Question about StructureTools and PDBFileReader
Andreas Prlic
ap3 at sanger.ac.uk
Sat Jan 12 00:32:30 UTC 2008
Hi Martin,
I am not sure what you mean with "not counted". When I test
the PDB file you posted below it parses all the 3 amino acids
into Groups, which is the intended behaviour.
PDBFileParser pdbpars = new PDBFileParser();
Structure structure = pdbpars.parsePDBFile(inStream) ;
System.out.println(structure);
Chain c = structure.getChainByPDB("A");
List<Group> groups = c.getAtomGroups();
for (Group g: groups){
System.out.println(g);
}
System.out.println("sequence: " + c.getAtomSequence());
gives an output of:
structure null DepDate: Thu Jan 01 01:00:00 GMT 1970 Resolution: 0.0
ModDate: Thu Jan 01 01:00:00 GMT 1970 chains:
chain: >A<
length SEQRES: 0 length ATOM: 3 aminos: 3 hetatms: 0 nucleotides: 0
DBRefs: 0
Molecules:
AminoAcid ATOM:GLN Q 27 true ATOMatoms: 7
AminoAcid ATOM:SER S 1027 true ATOMatoms: 6
AminoAcid ATOM:LEU L 2027 true ATOMatoms: 8
sequence: QSL
In case you would want to access the groups as SEQRES groups then
these residues need to be specified in the the corresponding header line in the
file. see also http://biojava.org/wiki/BioJava:CookBook:PDB:seqres
Does that help?
Andreas
--------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
On Fri, 11 Jan 2008, Martin Heusel wrote:
> Hi,
>
> i read a PDB file with PDBFileReader.getStructure and want to extract
> the backbone of a chain with StructureTools. Now i have seen that for
> entries e.g.
>
> ATOM 3505 N GLN A 27 32.144 27.054 0.696 1.00 47.70 N
> ATOM 3506 CA GLN A 27 32.507 26.162 -0.401 1.00 42.73 C
> ATOM 3507 C GLN A 27 31.388 26.137 -1.437 1.00 40.44 C
> ATOM 3508 O GLN A 27 30.205 26.248 -1.096 1.00 41.47 O
> ATOM 3509 CB GLN A 27 32.729 24.738 0.121 1.00 42.51 C
> ATOM 3510 CG GLN A 27 34.124 24.449 0.611 1.00 39.02 C
> ATOM 3511 CD GLN A 27 34.158 23.301 1.593 1.00 41.90 C
> ATOM 3512 OE1 GLN A 27 33.982 22.143 1.214 1.00 39.58 O
> ATOM 3513 NE2 GLN A 27 34.386 23.615 2.869 1.00 43.85 N
> ATOM 3514 N SER A1027 31.762 25.988 -2.703 1.00 33.42 N
> ATOM 3515 CA SER A1027 30.776 25.929 -3.769 1.00 31.11 C
> ATOM 3516 C SER A1027 29.915 24.723 -3.462 1.00 27.99 C
> ATOM 3517 O SER A1027 30.418 23.706 -2.991 1.00 29.25 O
> ATOM 3518 CB SER A1027 31.449 25.746 -5.130 1.00 22.71 C
> ATOM 3519 OG SER A1027 30.542 25.185 -6.056 1.00 28.95 O
> ATOM 3520 N LEU A2027 28.619 24.838 -3.718 1.00 25.68 N
> ATOM 3521 CA LEU A2027 27.714 23.743 -3.444 1.00 23.42 C
> ATOM 3522 C LEU A2027 27.489 22.933 -4.694 1.00 24.27 C
> ATOM 3523 O LEU A2027 26.750 21.950 -4.675 1.00 28.95 O
> ATOM 3524 CB LEU A2027 26.391 24.278 -2.906 1.00 23.54 C
> ATOM 3525 CG LEU A2027 26.547 25.090 -1.619 1.00 22.93 C
> ATOM 3526 CD1 LEU A2027 25.179 25.430 -1.056 1.00 26.74 C
> ATOM 3527 CD2 LEU A2027 27.361 24.285 -0.603 1.00 19.54 C
>
> the two residues SER and LEU are not counted. However, the fasta file
> from pdb.org website shows both residues for that chain. I wonder how
> the two entries A1027 and A2027 are interpreted by StructureTools.
>
> Thanks for any hints
>
> Martin
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Biojava-l
mailing list