[DAS2] structure DAS

Fri Jun 3 19:31:20 UTC 2005

Hi Andrew,

We already have a DAS1 enabled structure browser (SPICE - java 
webstart based on JMol) as part of the efamily project - see here:

	http://www.efamily.org.uk/software/dasclients/spice/

This gets structure information from a reference DAS1 server that has 
been extended with a 'structure' command (spits out 3D coordinates as 
well as sequence).  There are various plugins for this server that 
will server coordinate sets from raw PDB files, a replicated oracle 
instance of MSD and structures contained in SRS.

SPICE can display features from DAS servers on either PDBRESNUM 
coordinate or UNIPROT coordinate systems.  There's an DAS alignment 
service (another DAS1 extension that we've implemented) that provides 
the mapping between the two coordinate systems.

By default, SPICE gets information about available DAS servers from a 
DAS server registry that we have setup and are starting to use 
seriously.  The following is the interface for browsing lists, but 
the registry itself is a webservice that can be called and queried 
from clients

	http://das.sanger.ac.uk/registry/listServices.jsp

(select name 'UniProt' or 'PDBresnum' to get lists of what services 
have been registered with this.  Both the efamily and Biosapiens 
projects are using this)

This doesn't of course address true 3D DAS (DAS servers that provide 
3D points or surfaces wrt to 3D structures) but it does allow us to 
map sequence features onto structure easily and we'll be building on 
it.

SPICE and the associated DAS extensions have been implemented by 
Andreas Prlic <ap3 at sanger.ac.uk> - with some help from Thomas Down, 
the developer of Dazzle and others at Sanger.  I'm sure they will 
chime in with details to questions about this.

Tim

At 12:50 pm -0600 3/6/05, Andrew Dalke wrote:
>I've been thinking more about the general idea of a structure DAS.
>
>I think it would be good to have someone with more recent
>(and better) structure knowledge than I do.  This may be
>the woman from RCSB mentioned yesterday.  Another idea is
>Steven Brenner.
>
>There are two main ways to think about proteins: sequence
>and conformation.
>
>The sequence model is similar to that used for DNA.
>Sequences have residues arranged in a line, with positions
>numbered by position.
>
>The biggest database for this is SWISS-PROT.  Here's
>an example of features
>
>FT   DOMAIN      583    920       HECT.
>FT   REGION      515    571       PABP-like.
>FT   COMPBIAS    108    119       Asp/Glu-rich (acidic).
>FT   COMPBIAS    158    181       Pro-rich.
>FT   COMPBIAS    451    470       Arg/Glu-rich (mixed charge).
>FT   COMPBIAS    479    488       Arg/Asp-rich (mixed charge).
>FT   COMPBIAS    610    621       Asp/Glu-rich (acidic).
>FT   COMPBIAS    858    878       Pro-rich.
>FT   ACT_SITE    889    889       Glycyl thioester intermediate (By
>FT                                similarity).
>
>These are feature types, start/end position, and a description.
>I imagine there is an ontology for these but I haven't been
>following that work.
>
>
>Structure is more complicated.  The biggest data source
>for this is the PDB.  Things to worry about:
>
>  * a PDB record may contain aggregates of protein, DNA, lipids,
>waters, ions, ligands, post-translational modifications and
>other bits and pieces.
>
>  * the sequence listed for a chain may be different than
>found from crystallography.
>
>  * residue numbers in the structure may not be consecutive.  Eg,
>in a chain the residue ids may be -2, -1, 1, 2A, 2B, 2C.  The
>numbering is often done to preserve residue identifiers across
>homologous structures.
>
>  * some features are at the atomic level and not feature level.
>For that matter, some people like things like "center of ring"
>but I think we can ignore those.  Others like "binding pocket"
>but there's no good way to specify that.
>
>  * some residues have "alternate" conformations, eg, a side
>chain that's believed to have two common orientations.  I
>don't think we need to worry about this.
>
>  * NMR structures (and others) may have multiple models.
>I think we don't need to worry about this.  All programs I
>know of handle these as alternate conformations and have
>no way to say a given feature is on only one of those
>conformations.
>
>  * some features may be over several regions of a protein,
>or across several different chains.  Eg, a disulphide bond
>between two different proteins or an indicator of a beta
>barrel composed of multiple proteins
>
>  * strange things, like a protein covalently bonded to a
>piece of DNA.  Those chemists are so whacky!  Here's a
>picture of one done in my old group
>   http://www.ks.uiuc.edu/Research/pro_DNA/hmgd/SDNA_t.gif
>from
>   http://www.ks.uiuc.edu/Research/pro_DNA/hmgd/
>I think it's okay to linearize these.
>
>  * crystal structures and symmetries.  One example that
>comes to mind is the virus structure I worked on where
>a beta sheet went from one protein chain on the given
>protomer to another protein chain on the next protomer
>around the 5-fold symmetry access.  But the structure
>record only contains a single protomer.  I don't think
>we need to worry about this because to the best of
>my knowledge that information is not available in
>any database; it's extracted by humans reading the
>comments and associated papers.
>
>
>Beyond the technical details,
>
>  Who are the test users?
>
>  What's the reference platform?  Should there even be one?
>There's a boatload of 3d structure viewers.  A decade ago
>Steven Brenner proposed a generic format for selection +
>annotation information.  Perhaps that's a better path?
>
>   Is writeback needed?
>
>					Andrew
>					dalke at dalkescientific.com
>
>_______________________________________________
>DAS2 mailing list
>DAS2 at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/das2

-- 
----------------------------------------------------------------------------
Dr Tim Hubbard                         email: th at sanger.ac.uk
Head of Human Genome Analysis          Tel (direct): +44 1223 496886
Wellcome Trust Sanger Institute        Tel (switch): +44 1223 834244
Wellcome Trust Genome Campus, Hinxton  Fax: +44 1223 494919
Cambridgeshire. CB10 1SA. UK.          URL: http://www.sanger.ac.uk/Users/th
----------------------------------------------------------------------------