[BioPython] Bio.PDB : loading Big PDB with segments

Tue Aug 1 21:09:22 UTC 2006

Arturas Ziemys wrote:
> Hi,
> 
> Whose PDB files are generated by NAMD or VMD. NAMD is molecular
> dynamics programs and VMD for structure manipulation and
> visualization. My modeled systems - and believe the systems of others
> in MD - are big in sense that these PDB files exceeds the limits in
> resid or serials. For example, as far I understant, unification of
> atoms in VMD is made with segment information and it has no problems
> with that.
> 
> In my opininion those files follow PDB format. At least I found no
> differences in column structure or column content of PDB. It seems
> that Bio.PDB just takes the segment's identities as some record to
> ATOM entry, but they are meaningless making them unique or original
> if the records with the same serial are met in PDB. After I tryed to
> load those files, I got plenty errors and the "dublicated" entries
> were just skipped.

It sounds like there is just too much data for the original column 
widths to hold, and that Bio.PDB simply doesn't understand the 
conventions being used.

Hopefully the file format will be extended officially, but I suspect 
(without having looked at the data) that these NAMD/VMD files are not 
following the strict PDB format.

That's not to say Bio.PDB shouldn't try and support them in permissive 
mode.  I think this might be a job for the module's author, Thomas 
Hamelryck (who is subscribed to this mailing list).

> I could do some "preproccesing" on PDB supplying chain identifier
> foer each segment each time load PDB files and remove supplied chain
> labbels each time on exit. But I am interested is there any another
> way ?

Can you output the data in a different file format? Does mmCIF suffer 
from the same limits when dealing with large molecules?

You might also try Konrad Hinsen's Molecular Modelling Toolkit (MMTK). 
In my experience its fussier than Bio.PDB for non-standard PDB files, 
but on the other hand many of its users may also use NAMD/VMD.

http://www.python.net/crew/hinsen/MMTK/

There is also the Python Macromolecular Library (mmLib) but I have never 
  tried it myself:

http://pymmlib.sourceforge.net/

> I could attach as an examle, but comppressed file is ~ 1MB,
> uncompressed > 5 MB. If it is OK with the size - I can send a PDB
> file.

Please don't send the file to the mailing list - it would be a bit big.

I suggest you file a bug (include version numbers for Python, BioPython, 
NAMD and VMD too), and then choose "create an attachment" and upload the 
file - a standard compression like .zip or .taz.gz should be fine.

http://bugzilla.open-bio.org/

Thank you

Peter