[DAS2] structure DAS
Tim Hubbard
th at sanger.ac.uk
Fri Jun 3 19:31:20 UTC 2005
Hi Andrew,
We already have a DAS1 enabled structure browser (SPICE - java
webstart based on JMol) as part of the efamily project - see here:
http://www.efamily.org.uk/software/dasclients/spice/
This gets structure information from a reference DAS1 server that has
been extended with a 'structure' command (spits out 3D coordinates as
well as sequence). There are various plugins for this server that
will server coordinate sets from raw PDB files, a replicated oracle
instance of MSD and structures contained in SRS.
SPICE can display features from DAS servers on either PDBRESNUM
coordinate or UNIPROT coordinate systems. There's an DAS alignment
service (another DAS1 extension that we've implemented) that provides
the mapping between the two coordinate systems.
By default, SPICE gets information about available DAS servers from a
DAS server registry that we have setup and are starting to use
seriously. The following is the interface for browsing lists, but
the registry itself is a webservice that can be called and queried
from clients
http://das.sanger.ac.uk/registry/listServices.jsp
(select name 'UniProt' or 'PDBresnum' to get lists of what services
have been registered with this. Both the efamily and Biosapiens
projects are using this)
This doesn't of course address true 3D DAS (DAS servers that provide
3D points or surfaces wrt to 3D structures) but it does allow us to
map sequence features onto structure easily and we'll be building on
it.
SPICE and the associated DAS extensions have been implemented by
Andreas Prlic <ap3 at sanger.ac.uk> - with some help from Thomas Down,
the developer of Dazzle and others at Sanger. I'm sure they will
chime in with details to questions about this.
Tim
At 12:50 pm -0600 3/6/05, Andrew Dalke wrote:
>I've been thinking more about the general idea of a structure DAS.
>
>I think it would be good to have someone with more recent
>(and better) structure knowledge than I do. This may be
>the woman from RCSB mentioned yesterday. Another idea is
>Steven Brenner.
>
>There are two main ways to think about proteins: sequence
>and conformation.
>
>The sequence model is similar to that used for DNA.
>Sequences have residues arranged in a line, with positions
>numbered by position.
>
>The biggest database for this is SWISS-PROT. Here's
>an example of features
>
>FT DOMAIN 583 920 HECT.
>FT REGION 515 571 PABP-like.
>FT COMPBIAS 108 119 Asp/Glu-rich (acidic).
>FT COMPBIAS 158 181 Pro-rich.
>FT COMPBIAS 451 470 Arg/Glu-rich (mixed charge).
>FT COMPBIAS 479 488 Arg/Asp-rich (mixed charge).
>FT COMPBIAS 610 621 Asp/Glu-rich (acidic).
>FT COMPBIAS 858 878 Pro-rich.
>FT ACT_SITE 889 889 Glycyl thioester intermediate (By
>FT similarity).
>
>These are feature types, start/end position, and a description.
>I imagine there is an ontology for these but I haven't been
>following that work.
>
>
>Structure is more complicated. The biggest data source
>for this is the PDB. Things to worry about:
>
> * a PDB record may contain aggregates of protein, DNA, lipids,
>waters, ions, ligands, post-translational modifications and
>other bits and pieces.
>
> * the sequence listed for a chain may be different than
>found from crystallography.
>
> * residue numbers in the structure may not be consecutive. Eg,
>in a chain the residue ids may be -2, -1, 1, 2A, 2B, 2C. The
>numbering is often done to preserve residue identifiers across
>homologous structures.
>
> * some features are at the atomic level and not feature level.
>For that matter, some people like things like "center of ring"
>but I think we can ignore those. Others like "binding pocket"
>but there's no good way to specify that.
>
> * some residues have "alternate" conformations, eg, a side
>chain that's believed to have two common orientations. I
>don't think we need to worry about this.
>
> * NMR structures (and others) may have multiple models.
>I think we don't need to worry about this. All programs I
>know of handle these as alternate conformations and have
>no way to say a given feature is on only one of those
>conformations.
>
> * some features may be over several regions of a protein,
>or across several different chains. Eg, a disulphide bond
>between two different proteins or an indicator of a beta
>barrel composed of multiple proteins
>
> * strange things, like a protein covalently bonded to a
>piece of DNA. Those chemists are so whacky! Here's a
>picture of one done in my old group
> http://www.ks.uiuc.edu/Research/pro_DNA/hmgd/SDNA_t.gif
>from
> http://www.ks.uiuc.edu/Research/pro_DNA/hmgd/
>I think it's okay to linearize these.
>
> * crystal structures and symmetries. One example that
>comes to mind is the virus structure I worked on where
>a beta sheet went from one protein chain on the given
>protomer to another protein chain on the next protomer
>around the 5-fold symmetry access. But the structure
>record only contains a single protomer. I don't think
>we need to worry about this because to the best of
>my knowledge that information is not available in
>any database; it's extracted by humans reading the
>comments and associated papers.
>
>
>Beyond the technical details,
>
> Who are the test users?
>
> What's the reference platform? Should there even be one?
>There's a boatload of 3d structure viewers. A decade ago
>Steven Brenner proposed a generic format for selection +
>annotation information. Perhaps that's a better path?
>
> Is writeback needed?
>
> Andrew
> dalke at dalkescientific.com
>
>_______________________________________________
>DAS2 mailing list
>DAS2 at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/das2
--
----------------------------------------------------------------------------
Dr Tim Hubbard email: th at sanger.ac.uk
Head of Human Genome Analysis Tel (direct): +44 1223 496886
Wellcome Trust Sanger Institute Tel (switch): +44 1223 834244
Wellcome Trust Genome Campus, Hinxton Fax: +44 1223 494919
Cambridgeshire. CB10 1SA. UK. URL: http://www.sanger.ac.uk/Users/th
----------------------------------------------------------------------------
More information about the DAS2
mailing list