Back (for now)

Steven E. Brenner brenner@akamail.com
Tue, 8 Jul 1997 14:34:07 +0900 (JST)


> >   I like the general outline; I think I agree with you that 2D structure
> > is a module which should somehow be applicable to both 3D and 1D
> > structure.   One consideration is that every (modern) 3D protein structure
> > has a known 1D structure (i.e., sequence).   So, perhaps an easy way to
> > impelement all of this would be that a 3D-structure has a 1D-structure,
> > and a 1D-structure has a 2D structure.   
> 
> This seems reasonable to me, but may become awkward in some situations. 
> Let's say we have a object to represent a residue within a 3D 
> structure and we want to know what 2D state it is in. It seems more 
> intuitive for the residue to store this data directly rather than to 
> have it always get this information from a 2D structure associated with 
> the whole 3D structure. My point here is that having a separate 2D 
> structure would be handy for some (but not all) situations.

Good thinking; I agree it would be inconvenient to have to go from a
residue to sequence to secondary structure.  

However, your suggestion would require having two different ways of
storing 2D structure -- either with the sequence or with the structure (or
both).  This could lead to confusion (what if the 2D structures disagree). 

I think the compromise which probably solves both problems is to build a
method into the structure object which allows easy retrieval of 2D
structure info, but does so via the 1D sequence and then via the 2D
structure.  That is, delegation rather than duplication.



> >   I use 'has a,' as one sort of relationship, though I haven't figured out
> > if that is best.  Comments appreciated!  One reason for this approach is
> 
> In general, I favor 'has a' relationships since they decrease 
> dependencies and encourage a more loose coupling between modules,
> making it easier to use each module stand-alone, or with other modules. 
> It also promotes extendibility. A 2D module might inherit from and 
> extend the 1D module (Bio::Seq), as Georg hinted at. This may be 
> reasonable.
> 
> >   I like your thoughts about folds (e.g., 4-helix-bundle), as a
> > description of the 3D structure; I had not previously considered this.
> > However, these describe a domain as a whole rather than any particular
> > details of either the secondary or tertiary structure.  Perhaps we should
> > have a DomainDescription module which is sort of like the 2D-structure
> > module. Where 2D-structure contains secondary structure elements,
> > DomainDescriptions have folds. A tricky caveat here is that folds can be
> > discontinuous in sequence.
> 
> Take a look at:
> http://genome-www.stanford.edu/~sac/perlOOP/bioperl/schema/struct.html
> I've tried to do something similar with my Bio::Struct::Domain.pm 
> module.


  My thinking was that the 2D structure object would contain a string (as
in beads on a string) of secondary structure elements (SSEs).  Likewise,
for Domain.  That is, the domain object is effectively an attribute
mapped onto the sequence.

  Your domain object sort of comes from the opposite approach saying "I'm
a domain and here is where I'm located"  But if you look at residue 36 and
want to know what domain it is in, it would be necessary to query all of
hte domain objects to see whether it matches.  Further, it is hard using
just your Domain object to quickly see where domain boundaries are
(without querying all of the objects).  In a sense, "your" domain objects
are the things I see being mapped onto the sequence, but "my" domain
object (or mybe I should call it "domain order") is the object which
contains the mapping. 

  Am I making any sense here?

> 
> > > One more point: my hypothetical Bio::Struct.pm module doesn't know 
> > > anything about 3D structures but delegates this task to Bio::Struct::PDB.pm. 
> > > Similarly, there could be another module that handles strictly 2D issues. 
> > 
> > Naming is more of a philosophical and political question than a techical
> > one.  On these grounds, I think that it is important that the object which
> > knows about coordinates be Bio::Struct.  The reason is that the thing most
> > people will want to do most often is parse in a PDB file and do something
> > with it -- this "jumble of coordinates" will be the "currency" for
> > structures just as "Bio::Seq" will be the corresponding one for sequences.
> > 
> > To reduece learning curve and to make things appear as simple as possible,
> > I think that having a 'Bio::Seq' and a 'Bio::Struct' which are
> > more-or-less capable of appearing to do everything necessary is important.
> 
> Good points. I'm just concerned about creating complex monolithic 
> modules that are difficult to use and extend.

  Agreed.  But for now, I think that the danger is greater if we go
towards the GenericParentIterator than if we go to small but servicable
(but potentially diffcult and awkward to extend).


> > > > I have no objection to this, but curious to know why you want to
> > > > be able to do slices for revcom, etc.
> > > 
> > > I needed to process sequences for all genes on a yeast chromosome. It 
> > > seemed easiest to create a big PreSeq object for the chromosomal sequence 
> > > and then extract sub-sequences for each gene as needed. Since some genes 
> > > are on the complementary strand, I needed revcom() to work like str(). 
> > > See, for example:
> > > http://genome-www.stanford.edu/~sac/perlOOP/bioperl/lib/Bio/Gene/Seq.pm
> > 
> > Ok; this makes sense.  I had forgotten about revcom's current
> > impelmentation.  One idea was that it would modify the existing object;
> > another idea was that it would return a modified object.  Right now it
> > seems to be roughly in-between. :)
> > 
> > My suggested modification (probably can't show up until Bio::Seq) would be
> > for revcom to return an object with the required modification.  Probably
> > my preferred calling sequence would be:
> > 
> > $mybackgene = new Bio::Preseq ($mychromasome->str($end,$beg));
> > $mygene = $mybackgene->revcom();
> > print $mygene->str(), "\n";
> > 
> > 
> > Or, maybe we should add another method like getseq to return a sequence
> > object of a slice:
> > 
> > $mybackgene = $mychromasome->get_seq_obj($end,$beg);
> >    # ick!  get_seq_obj is a horrible method name!    
> > $mygene  = $mybackgene->revcom();
> 
> I would favor the latter strategy since it is clearer that you are 
> dealing with a new sequence object.

Suggestions for a new method name better than "get_seq_obj($end,$beg)"?



I'm off from tonight, so you likely will hear little if anything from me
before I arrive at Stanford on 1 August.


Cheers,

  Steve