[DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2 perspective

Mon Mar 5 17:30:10 UTC 2007

Summary of DAS & Feature Classification workshops, February 26-28 2007,
Hinxton

DAS Developers Workshop:
http://www.sanger.ac.uk/Users/ap3/dasworkshop.html

BioSapiens Feature Type Classification Workshop:
http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm

DAS1 clients discussed:
          Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView,
Ensembl ContigView, ...
DAS1 servers discussed:
          PFam, Ensembl, ProServer, Sisyphus, ...

DAS1 extensions:
          Gene DAS
          Protein DAS
          Alignmen tDAS          
          Structure DAS
          3D-EM DAS
          Interaction DAS
          MaDAS (writeback?)
"simple" DAS

DAS/2

BioSapiens Overview:  http://www.biosapiens.info
<http://www.biosapiens.info/>  
  Large-scale genome/protein annotation, 25 institutions from 14
countries across Europe participating
  Currently 23 DAS servers within BioSapiens project serving 69 DAS
sources.
  4 servers appear to be down (21 sources fail features query)
  See http://www.biosapiens.info/page.php?page=biosapiensdir for more
DAS server stats

Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed
well in DAS/2
          Gene DAS
          Protein DAS
          Alignment DAS          
"simple" DAS 

Major concerns for Ensembl / Sanger / BioSapiens that surprised me:

    A) In general the use of a smaller subset of DAS1 than expected
        Many BioSapiens DAS servers don't support "entry_points" query
(64 fail|NA)
        Many BioSapiens DAS servers don't support "types query" (49
fail|NA)
               in DAS1 features themselves can carry most of the types
info
        Some BioSapiens DAS servers don't support "features" query
parameters (only the features query with no params)
        Many BioSapiens clients don't use "entry_points" query, "types"
query, or any feature filters (always get all features for a given
segment)
        BioSapiens protein annotation almost exclusively uses flat
(one-level) features
very little or no use of "group" attribute to make two-level features
example: disulfide bond annotation- relies on rendering or prior
knowledge to differentiate
        Ensembl DAS servers are in general serving one type per source
        These simplifications of clients and servers are reinforcing
each other
        If using subset of DAS1, does this mean that DAS/2 might be too
complex?
        But with these simplifications, the complexity is getting pushed
into other places

  B) Data overload
        Number of servers, sources, types
             Ensembl: will have 1000s of sources soon
        Redundancy concerns
             example: Pfam domain 
   Many sources with same / similar annotation type - "Pfam domain"
          Slight differences in feature ranges
          Which is the authority?
          Is there a way to help clients decide which can be combined
        Mirrors

  C) Feature Classification / Ontology issues
        SO currently inadequate for describing protein annotation
               developing PAO (Protein Annotation Ontology)
        types proliferation
            example: one feature type for each PFam domain?
                ~9K PFam-A domains
                If look at PFam-B (PRODOM that don't overlap PFam-A),
then ~70K / 450K more (>2 proteins in family / not)
            of not in unique type, where does that information go?
       Need multiple ontology terms to describe a single type?

------------------------------------------------------------------------
------

DAS WishList (last session of DAS workshop, people listed desired
improvements on whiteboard)

Multi-level features (Gregg)
Multi-level stylesheets (Ed)
Caching (last-modified, if-modified-since, TTL)
Provenance of features from other sources (features from different
sources with same IDs? types?)
Large analysis / Scalibility
       1000s of seqs + 1000s sources + types ?
More queries: feature types / date
Entry point support
Encryption support
Stats-query interface -- count # of features of type for a source
ID ref external (URI / URN)
Proper error / exception handling
Asynchronous requests
       process
       batches
Better Stylesheets
Mapping servers

We've discussed most of these wishlist issues before while developing
DAS/2, though we certainly haven't completely solved all of them...