[DAS] das woes

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 23 Oct 2002 16:07:43 +0100 (BST)


Hi,

I've found myself in (yet another) place with
<understatement>limited</understatement> computing
facilities trying to provide informatics support to
lab biologists. They aren't interested (or able) to
write scripts or applications. They can, however,
browse web sites. One experiment may have a thousand
pipet strokes, so they don't think twice about a
thousand mouse clicks. I currently have access to a
handfull of PCs with limited disk space. In short,
this is the typical wet-lab setup but with a resident
bioinformatician.

To answer some of the questions they have, I tried to
use the ensembl das server. Some das-client scripts
interogate the information and display this via
servlets which they can clicking on. I reasoned that
this would let me data-mine the genes, transcripts and
sequence data without needing to have a local ensembl
install. How wrong I was.

Some of the glitches were due to bugs in the dazzle
server and biojava client libraries. Thomas and I have
been fixing these as they've surfaced. I've now hit
the hard limits of the data-model. The bottom line is
that DAS gives you /just/ enough to make a reasonably
consistent guess about how to render a colored box,
and nothing more. Dazzle and the BioJava das library
negotiate to talk XFF instead of DASGFF so at least
we're getting feature hierachies over with
non-contiguous locations where necisary. However, I
get questions from the scientists like "I want to
display genes with multiple transcripts, at least one
of which is expressed in testies" and I have to say -
"how do you know it's expressed in testies?" and they
say, "unigene tells me so" and I have to say that
unigene doesn't seem to be linked to the transcripts
served by the ensembl das server. And, even if they
were, the query to fetch this data would be painfull
because there's no way to query features by arbitrary
properties in the DAS spec.

So, the bottom line is that I'm installing a local
ensembl. At least that way I have a fighting chance of
accessing the info they want. I thought the point of
das was that all this info could be distributed, and
big institutions could provide the big data sets via
web services avoiding the need for local replications.
The lack of queryability in the DAS protocol kills
this dream dead. Colored boxes are all well and good,
but everyone here wants to ask questions about these
boxes that DAS is fundamentaly unable to answer.

Are we stuck in the land of development inertia, or is
there a will to do better? Is everyone (except me)
wholy satisfied with the status quo of DAS1.5, or do
some of us want to build a realistic solution for
shairing, integrating, mining and authoring
distributed annotations?

Matthew

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com