[Bioperl-l] Parsing PDB entries in BioPerl
Kris Boulez
Kris.Boulez@algonomics.com
Tue, 13 Nov 2001 19:10:56 +0100
As I found myself writing ad-hoc scripts to get certain data out of a
PDB entry, I've decided to write a PDB parser for BioPerl.
The idea is to parse every line in the entry and to have access to all
the data via some Bio:: object. The work on the SeqIO parser (Bio::SeqIO::pdb)
is progressing nicely.
For the moment I'm working on parsing all the different 'records' (PDBspeak
for different lines) and not so much on how to store the info in a Bio::
object (references are already stored in Bio::Annotation::Reference objects).
The moment to start thinking abouth 'how' to store 'what' inside 'which'
Bio::* object has arrived.
My first thought was to inherit from a Bio::Seq object, but this does
not seem to be the right approach
- which sequence to store (the one from Swiss-Prot)
- not every residue has coordinates (C,N terminal)
- PDB entries can consist of multiple 'chains' (i.e. a complex of two
proteins)
- how to handle post-translational modifications
- there is no easy access to the data that makes PDB special (x,y,z
coordinates, ...)
- how to handle 'models' (structures determined by NMR, do not consist
of one, but multiple entries).
This suggests that a new type of object might be needed. To start
thinking about this I think it might be good to think about how the user
might use this object (i.e. 'which questions would you ask ?). So
therefor I would want to ask you which data in a PDB entry you're
typically intrested in and which questions you want to ask to such an
object.
Kris,
--
Kris Boulez Tel: +32-9-241.11.00
AlgoNomics NV Fax: +32-9-241.11.02
Technologiepark 4 email: kris.boulez@algonomics.com
B 9052 Zwijnaarde http://www.algonomics.com/