[BioRuby] bio.pdb doubt

Thu Feb 21 13:47:37 UTC 2008

On 21 Feb 2008, at 12:31, K. Shameer wrote:

>> 2. The HETATM/ATOM distinction is tricky as well. Some HETATM records
>> (including the solvent in some PDB files) are given distinct chain  
>> ids
>> and in some cases do represent linear chain like molecules. Bound DNA
>> for instance: ATOM? HETATM? Chain? Not a chain? There is no  
>> consistent
>> representation of these things in (legacy) PDB files so any choice  
>> you
>> make will be a compromise.
>
> I agree with you in the case of NA and in the data inconsistency  
> issues
> with PDB files. But in case of  protein complexes HETATM tags are   
> very
> helpful. For example I used to extract ligands(small molecules) from  
> set
> of related PDB (ligand + receptor complex) files to use as a library  
> of
> substrates for docking studies. I dont know how easily I can code this
> with BioRuby/Bio.PDB.  I haven't tried this with bioruby. But I will  
> try
> it out very soon.

It should be easily doable. If not the library more needs work, as  
this is just the kind of thing it is designed to help with.

>> That said, if you want to have a poke through the PDB parser and make
>> some changes then be my guest. It's been a while since I did any PDB
>> stuff (and god-willing it will be a while until I do some more!) so
>> it's an area that could probably do with a fresh pair of eyes.
>
> That will be interesting, provided if you could offer some offline
> help/directions to start with. Infact I am specifically interested in
> something that has not taken up by any of the Bio-toolkit  
> libraries :).
> Can we (Alex, Jan, all other bioruby pros and Shameer) work on to  
> develop
> a module that integrated both Bio.Pdb and Bio-Graphics module to  
> generate
> modular topology diagrams ? Please let me know your comments, we can
> discuss and work on this :) !!!

The parser reads the SHEET / HELIX / TURN records from the PDB file.  
I've never really played with them, but they look easily readable.

irb(main):011:0> pdb = Bio::PDB.new(IO.read('/homes/alexg/Desktop/ 
1TCA.pdb'))
=> #<Bio::PDB entry_id="1TCA">
irb(main):012:0> pdb.record['HELIX'].each do |helix|
irb(main):013:1* puts "Start: #{helix.initSeqNum} - End:  
#{helix.endSeqNum}"
irb(main):014:1> end
Start: 13 - End: 18
Start: 44 - End: 57
Start: 76 - End: 93
Start: 106 - End: 117
Start: 142 - End: 146
Start: 152 - End: 156
Start: 162 - End: 169
Start: 212 - End: 216
Start: 226 - End: 242
Start: 268 - End: 287
Start: 119 - End: 121
Start: 139 - End: 141
Start: 250 - End: 252
Start: 255 - End: 257
Start: 302 - End: 304
Start: 68 - End: 70

Going from there to an overall topology diagram (are you thinking of  
something like this: http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/1tca/domA01.gif) 
  is more complicated. I guess the first step would be analysing the  
structure to see how the secondary structure elements interact and  
then applying some kind of layout algorithm (putting sheets side by  
side, etc...) Once you have the layout, using Bio-graphics, Cairo or  
something else to do the drawing is pretty straight forward (I assume!).

I know Roman Laskowski, who runs PDBSum. So I could ask him how his  
topology generation script works - if that's the kind of thing you are  
thinking of.

AlexG