[Bioperl-l] New modules for STRIDE and DSSP output

Kris Boulez Kris.Boulez@algonomics.com
Mon, 3 Dec 2001 10:46:51 +0100


Quoting Ed Green (ed@compbio.berkeley.edu):

> Since the Bio::Structure objects are in flux, I have not tried to
> integrate with them.  When thay are more settled, there are obvious ways
> that these secondary structural objects could be integrated with the
> more general structure objects.
> 
I don't expect too much flux on the Bio::Structure objects. At the
moment I'm in bug-fixing/speed-improvement mode.
I don't think it's wise to have two different implementations of the
same problem one directory away from each other. I wholeheartedly agree
that there is a big complementarity and a need for integration.

> And in regard to the new Bio::Structure object, I think it does a good
> job capturing and organizing structural info, but may be a bit too heavy
> for high-throughput work because of the sheer number of objects created
> (1/residue & 1/atom).  I'd like to cast my vote for having an
> additional, leaner Bio::Structure object that is (gasp!) less
> objectified, but more suitable for high-throughput work.  The idea
> of having dual structure objects was first suggested here:
> 
This PDB parser was written with the intent of parsing everything in a
PDB entry and thus being able to write it out (relatively) decently.

I did some profiling on the parser and found out some intresting things.
All timings are for reading one PDB entry (I think it was 1TIM, 3978
lines, 3740 ATOM records).

- reading the whole structure: 10.35 s
- nearly all time spent in reading coordinate section: 10.27 s
- one third of time is spent in parent/child bookkeeping (we need to
  avoid reference cycles and do it the hard way).
- Two third of all time is spent in creating/adding atom objects: 6.81 s
  (65%) (most of this time is bookkeeping work).

Therefor I propose to have an attribute for a StructIO handle to disable
the creation of certain subtypes (e.g. Atom).

  my $structio = Bio::Structure::IO->new(-file     => $file, 
  					-format    => 'PDB',
  					-no_atom   => 1,
					-no_header => 1);

  # would not parse the header section and wouldn't create
  # Atom objects

We also would benefit from this as for some of the questions we're
asking, we don't need access to the individual atoms.

The only disadvantage I see is that it is impossible to write out a
decent PDB file from the objects you get in this way (you have no idea
about atom coordinates).

Kris,
-- 
Kris Boulez 				Tel: +32-9-241.11.00
AlgoNomics NV 				Fax: +32-9-241.11.02
Technologiepark 4 			email: kris.boulez@algonomics.com
B 9052 Zwijnaarde 			http://www.algonomics.com/