[Bioperl-l] New modules for STRIDE and DSSP output
Kris Boulez
Kris.Boulez@algonomics.com
Mon, 3 Dec 2001 10:46:51 +0100
Quoting Ed Green (ed@compbio.berkeley.edu):
> Since the Bio::Structure objects are in flux, I have not tried to
> integrate with them. When thay are more settled, there are obvious ways
> that these secondary structural objects could be integrated with the
> more general structure objects.
>
I don't expect too much flux on the Bio::Structure objects. At the
moment I'm in bug-fixing/speed-improvement mode.
I don't think it's wise to have two different implementations of the
same problem one directory away from each other. I wholeheartedly agree
that there is a big complementarity and a need for integration.
> And in regard to the new Bio::Structure object, I think it does a good
> job capturing and organizing structural info, but may be a bit too heavy
> for high-throughput work because of the sheer number of objects created
> (1/residue & 1/atom). I'd like to cast my vote for having an
> additional, leaner Bio::Structure object that is (gasp!) less
> objectified, but more suitable for high-throughput work. The idea
> of having dual structure objects was first suggested here:
>
This PDB parser was written with the intent of parsing everything in a
PDB entry and thus being able to write it out (relatively) decently.
I did some profiling on the parser and found out some intresting things.
All timings are for reading one PDB entry (I think it was 1TIM, 3978
lines, 3740 ATOM records).
- reading the whole structure: 10.35 s
- nearly all time spent in reading coordinate section: 10.27 s
- one third of time is spent in parent/child bookkeeping (we need to
avoid reference cycles and do it the hard way).
- Two third of all time is spent in creating/adding atom objects: 6.81 s
(65%) (most of this time is bookkeeping work).
Therefor I propose to have an attribute for a StructIO handle to disable
the creation of certain subtypes (e.g. Atom).
my $structio = Bio::Structure::IO->new(-file => $file,
-format => 'PDB',
-no_atom => 1,
-no_header => 1);
# would not parse the header section and wouldn't create
# Atom objects
We also would benefit from this as for some of the questions we're
asking, we don't need access to the individual atoms.
The only disadvantage I see is that it is impossible to write out a
decent PDB file from the objects you get in this way (you have no idea
about atom coordinates).
Kris,
--
Kris Boulez Tel: +32-9-241.11.00
AlgoNomics NV Fax: +32-9-241.11.02
Technologiepark 4 email: kris.boulez@algonomics.com
B 9052 Zwijnaarde http://www.algonomics.com/