[BioRuby] Bug in writing PDB ATOM

Fri Feb 9 01:44:59 UTC 2007

On 2/8/07, Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:
> On 9 Feb 2007, at 07:54, Yen-Ju Chen wrote:
>
> > In bio/db/pdb/pdb.rb line 1019,
> > the ATOM entry is written as:
> >
> >           sprintf("%-6s%5d %-4s%-1s%3s %-1s%4d%-1s
> >
> > It results an ATOM entry as:
> > ATOM     61 OD1  ASN A   8     102.025  27.929 144.984  1.00
> > 88.56           O
> >
> > But the right ATOM entry should be
> > ATOM     61  OD1 ASN A   8     102.025  27.929 144.984  1.00
> > 88.56           O
> >
> > Note there are 2 spaces after '61' and one space before 'ASN'
> > I change this line to:
> >
> >           sprintf("%-6s%5d  %-3s%-1s%3s %-1s%4d%-1s
> >
> > and it works fine now.
> > But I am new to Ruby and not familiar with the format yet.
> >
> > Yen-Ju
> > _______________________________________________
> > BioRuby mailing list
> > BioRuby at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioruby
> >
>
> Hi Yen-Ju,
>
> Thanks for your bug report. In fact (as far as I can tell) the PDB
> format (http://www.wwpdb.org/documentation/format23/sect9.html) is
> ambiguous in this case. Columns 13-16 are specified for the 'Atom
> name' ('OD1' in the case you mention), but the justification of the
> field is not specified. Note that the field requires four columns so
> your fix (which reduces it to three) may break if you encounter an
> atom name with 4 characters.
>
> However, you are quite correct that the convention in most PDB files
> is that when less than 4 characters are used for the atom name, the
> field is aligned as you show. In summary, any of the following is a
> valid name according to my reading of the specifications, but the
> convention in many files is to use the form shown in the third and
> fourth examples rather than the first and second. Note that the fifth
> example is also a valid atom name and may break your fix:
>
> OD1
> N
>   OD1
>   N
> OD12
>
> I will change the code to use the conventional form where possible,
> but be careful with your fix because it may break on some (rare) PDB
> files.
>
> An important general point: PDB files (particularly older ones) are
> *very* messy. Efforts have been made within the PDB and at the EBI
> MSD to clean these files up, but there are still issues. This means
> that it is very hard to write a parser that can read in any PDB file
> and then output it in exactly the same format (including spacing
> etc...). The BioRuby parser should be able to parse any valid PDB
> file and output the data back out as a valid PDB format string, but
> the input and output are *not* guaranteed to be identical.
>
> I have not had time to actively maintain the PDB parsing in BioRuby,
> so if you are interested in Ruby and PDB files feel free to submit
> more bug reports and patches.
>
> Thanks again.

Thanx.
I understand it is messy on PDB format
and PDB is not the only one in this field. :)
I notice this bug because the output from bioruby cannot be read correctly
by some program I am using, like rasmol.
Anyway, I just start to use bioruby recently and still learning.
If I found some more bugs, I will try to send reports and patches.

By the way, I am working more on the structural side.
Currently BioRuby is more on sequence and database.
If people are interested,
I may submit some scripts for common structural stuff in the future.
for example, calculating symmetry-related position in unit cell based
on space group,
converting position from orthogonal to fraction coordinate,
converting format of heavy metal positions for various crystallography
packages, etc.

Yen-Ju

>
> Alex Gutteridge
>
> Bioinformatics Center
> Kyoto University
>
>
>