[Bioperl-l] PDB ATOM records: name, segid, etc.
Kris Boulez
kris.boulez@algonomics.com
Mon, 15 Jul 2002 10:20:44 +0200
[ It's nice to see that people are using these modules and have comments
on them. I'm aware that PDB writing isn't perfect at the moment. ]
I'll give a lightning talk about Bio::Structure at BOSC 02. We might
also discus this in more depth there.
Quoting Andrew Dalke (adalke@mindspring.com):
> Joe Krahn:
> > Although SEGID is depricated by the official PDB standard,
> > it is useful to me because I work with CNS files. Would people
> > be opposed to supporting it in BioPerl? (Note- most crystallographers
> > want to keep the SEGID. It is a useful thing, especially now
> > that PDB disallows CHAINID for ligands.)
Forgive me my ignorance (did NMR at university and left for the IT world
more then ten years ago): but what are these CNS files ? Is this a PDB
derived structure format ?
>
> I say keep it. You'll need it for old PDB files as well. 'Course,
> you'll also need the logic to distinguish between the different
> versions.
>
Does someone know where I could find descriptions of 'older' PDB
formats. The current parser is written based on a document titled
'Protein Data Bank Contents Guide: version 2.1 (october 25, 1996)' .
If so I would certainly add other versions.
> > Another useful but non-standard optional feature is a 4th residue
> > character. It can be useful for designating variants of a residue,
> > like HISD for HIS protonated at ND.
>
> Also very important. Great for things like "TIP3" waters. (BTW,
> I have an XPLOR background. :)
>
(see below)
> > The first two letters are always the element. The aliegnment seems
> > strange until you realize this. Think of carbon being represented
> > by " C". A leading non-letter character is allowed for atom names
> > that are too long, mostly hydrogens. The current pdb.pm shifts
> > <number>H correctly (a good guess) but will get all 2-letter elements
> > wrong. "CA " for calcium will become " CA ", a carbon atom.
>
> Um. The letters of the first two characters are always the element.
> Unless they aren't. I've seen "U" used for "Unknown". And then there's
> dealing with all the programs which has a different interpretation of that
> field. Roger Sayle gave a talk about this early last year.
>
> http://www.daylight.com/meetings/mug01/Sayle/m4xbondage.html
> ] 2c. All atom records containing " Q" as the first two characters of the
> ] atom name were ignored. The element " Q" is commonly used in NMR processing
> ] to represent pseudo atoms used in refinement
>
> > So, if pdb.pm is going to remove the leading space on
> > atom names (technically wrong, but probably desirable for many people)
>
> That isn't technically wrong. It depends on the context. " C" is the
> representation of "carbon" in the PDB file. Internally, bioperl could store
> it as "carbon" or "12" or "mixelja" so long as it is consistent and
> captures the data model correctly.
>
At the moment the Atom object is purely a container for info in the PDB
file. It only knows it's id ('CZ2'), it does not know that it is a
carbon.
Problem with these spaces is that people want to be able to say
if ( $atom->id eq "CZ2" ) {
without bothering about the spaces and/or rearrangement of the name in
the PDB files
A ->display_id() method, which would give the name as it was in the PDB
file, might help
> > then reading an ATOM needs to generate the element entry when an ATOM
> > doesn't include it. This can also be a problem - a PDB file with
> > no element entries and improper atom alignment will generate bad
> > element entries, but at least it works for all single-letter elements.
>
> That's a hard task. See section 3 in Roger's paper.
>
> A question I have is: Do you want a faithful representation of the data in the
> PDB (in which case a missing element field is left missing) or do you want
> a translation to another model of chemistry (as Roger does for SMILES)?
>
Which other format would people be intrested in ?
Kris,
--
Kris Boulez Tel: +32-9-241.11.00
AlgoNomics NV Fax: +32-9-241.11.02
Technologiepark 4 email: kris.boulez@algonomics.com
B 9052 Zwijnaarde http://www.algonomics.com/