Back (for now)
Steve A. Chervitz
sac@genome.stanford.edu
Wed, 2 Jul 1997 14:25:25 -0700 (PDT)
Steven E. Brenner wrote:
> This is interesting. Could you be a little more specific about where
> you think the overlap between 2D and 3D protein structure is, for module
> design? (Let's leave 2D RNA aside for now, as that will be even harder,
> though if it can be included -- that's great!) I have a hard time seeing
> how we could abstract any but the most incidental features. For example,
> domains usually are not known in 2D structure any more than they are in
> sequence. 'Predicted v. experimental' seems like a characteristic of the
> whole item; just a flag and a pretty incidental one at that. Active sites
> would be probably defined in quite different ways for 2D structure (which
> would be more like sequence) and 3D structure.
>
> Indeed, I would probably see more overlap between 2D structure and
> sequence than 2D structure and 3D. But, I'm interested that you see
> otherwise and would like to hear more details.
Here's my thinking: First, it's clear that "secondary structure" is just
a mental construct that helps us think about the structural organization
of biomolecules. Secondary structure can be thought of as a filter that
can be applied to either a sequence or a 3D structure. So, given a
primary sequence, you can list the secondary structural state of each
residue (based on a prediction or analysis of the known structure). Given a
3D structure, you can do the same thing by inspection of the actual 3D
fold.
You can think of a linear sequence or a 3D structure as an assembly of
secondary structural elements. In the case of the 3D structure, you can
also describe the connectivity of the elements (e.g., 5-stranded,
anti-parallel beta sheet, 4-helix bundle, etc.). This is still just a
representation of the 3D structure, not the actual structure, which is a
collection of connected atoms in 3-space. This is analogous to the way in
which a string of secondary structural states is a representation of a
primary sequence.
So (contrary to what I initially thought) it would seem best to NOT
intermingle secondary structure with 3D structure in a Bio::Struct module.
Instead, it may be best to keep secondary structural issues in a separate
module which can deal with sequences or structures.
However, there's one case where I can see some overlap between 3D and 2D
structural issues: circular dichroism (CD) experiments. Using CD you can
estimate the overall percentage of helix, sheet, and coil in a protein
without knowing anything about the distribution of these regions in the
molecule. Some NMR experiments can also estimate this. This information
can, of course, also be obtained by analysis of the 3D structure.
Thus there are some proteins which have data about the overall
fraction of secondary structure but have no 3D structure. I'm not aware
of any databases that store this sort of information, so we may not want
to worry too much about it now, but it is something to keep in mind.
One more point: my hypothetical Bio::Struct.pm module doesn't know
anything about 3D structures but delegates this task to Bio::Struct::PDB.pm.
Similarly, there could be another module that handles strictly 2D issues.
> As an aside, what is 'scop_dict.cf' in Bio::Struct::Scop_data ?
>
> I had no idea you had coded up so much which uses scop!
I decided to go ahead and create a scop module it since I knew I
would be doing alot of work with scop data. scop_dict.cf is a little
dictionary I created for converting between class/fold number to class/fold
name. You probably already have such a thing, but it was easy enough to
create. Here's a snippet:
1:All alpha 1:Globin-like
1:All alpha 2:Long alpha-hairpin
1:All alpha 3:Cytochrome c
...
> > My modified version of PreSeq.pm can be found at:
> > http://genome-www.stanford.edu/~sac/perlOOP/lib/Bio/PreSeq.pm
> > The main change is in the revcom() method to permit slicing. Note that it
> > would be nice to modify reverse() and complement() similarly.
>
> I have no objection to this, but curious to know why you want to
> be able to do slices for revcom, etc.
I needed to process sequences for all genes on a yeast chromosome. It
seemed easiest to create a big PreSeq object for the chromosomal sequence
and then extract sub-sequences for each gene as needed. Since some genes
are on the complementary strand, I needed revcom() to work like str().
See, for example:
http://genome-www.stanford.edu/~sac/perlOOP/bioperl/lib/Bio/Gene/Seq.pm
Cheers,
SteveC