[Bioperl-l] writing PDB format from BioPerl
Ed Green
ed@compbio.berkeley.edu
Thu, 10 Jan 2002 18:02:28 -0800 (PST)
On Fri, 11 Jan 2002, Kris Boulez wrote:
> I've just checked in a new version of Bio::Structure::IO::pdb.pm which
> has a working write_structure . Meaning that you can now write out PDB
> records from BioPerl.
>
> This is not the final version, but it is what I have now ready for 0.9.3
> (Ewan, I hope I'm on time). I think it is fairly complete.
>
> Things it doesn't do (right) at the moment
> - no ANISOU,SIGUIJ,SIGATM records for the moment
> - placement of TER record is sometimes different then in original
> (someone has the exact algoritm ?)
> - MASTER record (contains checksums) is not calculated, but used from
> original
> - minor glitches as PDB records are mostly created by humans and
> not computers
>
> In the near future I hope to add the missing records, add documentation
> and examples, and see if the DSSP and STRIDE modules can be integrated
> better.
Kris-
Very nice. I think I can help integrate the SecStr modules. Since the
Structure objects can now output pdb files, interaction between
STRIDE/DSSP and Structure objects can be completely abstracted away,
bringing the DSPP and STRIDE modules into bioperl API conformance.
What I have in mind is a method (perhaps called getSecStr) for Structure
objects which will take 'DSSP' or 'STRIDE' as a parameter. The indicated
executable will be invoked with proper pdb input. The results will be
parsed. These results, rather than being encapsulated in an object as
they are now, will just add information to the existing Structure object
at the level of Residue. This would require additional Residue fields for
secondary structure, exposed surface area, and any other information you
want to hang on to from DSSP/STRIDE. I guess what I'm describing is
letting go of STRIDE and DSSP as separate objects and just folding their
functionality into Structure objects.
The only problem with this design is that often I'm not interested in any
structural feature except secondary structure information. In this case
the only purpose of having a Structure object would be to call the
getSecStr method on it. That's fine, except that creating a Structure
object parses the pdb file, then getSecStr communicates with STRIDE/DSSP
through a pdb which it writes back out. Then, the only purposes of
parsing the pdb file is to write the pdb file back out. I understand that
this very thing happens with Seq objects, but parsing and writing sequence
files doesn't involve nearly as much overhead as parsing and writing
structure files.
It would be faster (and less error prone) if there could be a switch
when invoking new Structures, which just associates a pdb file with the
object and delays the parsing until some method is invoked which requires
parsing or until the user requests the full structure object. Then
getSecStr or any other future method which does an analysis on a structure and
requires a pdb file as input could just be passed the pdb file directly.
Comments/suggestions are welcomed.
Ed Green
***********************
Brenner Research Group
UC Berkeley
ed@compbio.berkeley.edu
510-642-9614
***********************