[Bioperl-l] writing PDB format from BioPerl
Kris Boulez
Kris.Boulez@algonomics.com
Fri, 11 Jan 2002 11:07:23 +0100
Quoting Ed Green (ed@compbio.berkeley.edu):
>
> What I have in mind is a method (perhaps called getSecStr) for Structure
> objects which will take 'DSSP' or 'STRIDE' as a parameter. The indicated
> executable will be invoked with proper pdb input. The results will be
> parsed. These results, rather than being encapsulated in an object as
> they are now, will just add information to the existing Structure object
> at the level of Residue. This would require additional Residue fields for
> secondary structure, exposed surface area, and any other information you
> want to hang on to from DSSP/STRIDE. I guess what I'm describing is
> letting go of STRIDE and DSSP as separate objects and just folding their
> functionality into Structure objects.
>
This was also my idea.
> The only problem with this design is that often I'm not interested in any
> structural feature except secondary structure information. In this case
> the only purpose of having a Structure object would be to call the
> getSecStr method on it. That's fine, except that creating a Structure
> object parses the pdb file, then getSecStr communicates with STRIDE/DSSP
> through a pdb which it writes back out. Then, the only purposes of
> parsing the pdb file is to write the pdb file back out. I understand that
> this very thing happens with Seq objects, but parsing and writing sequence
> files doesn't involve nearly as much overhead as parsing and writing
> structure files.
>
Do I see it correct when I say these are the steps you want to take
a) run STRIDE/DSSP on pdb_file. Produces output file (stride.out)
b) read in pdb_file and stride.out, giving you Structure object with
additional residue fields
c) do something on Structure object
I think there is a seperate design pattern for running an external
application in BioPerl (look under Bio::Tools::Run). This would do a)
Adding the STRIDE output to the Structure object (step b) can then be
done from a new Bio::Structure::IO::stride object (or from SearchIO ?).
> It would be faster (and less error prone) if there could be a switch
> when invoking new Structures, which just associates a pdb file with the
> object and delays the parsing until some method is invoked which requires
> parsing or until the user requests the full structure object. Then
> getSecStr or any other future method which does an analysis on a structure and
> requires a pdb file as input could just be passed the pdb file directly.
>
The BioPerl IO system is stream based and it all or nothing. Whence you
call next_structure() it has to go to the end.
I do agree that parsing every header line can get slow and looking at a
method for specifying which lines to parse and which not might be
something to look into.
Kris,