[DAS] "DAS/0.5" -- a different way of doing it
Lincoln Stein
lstein@cshl.org
Fri, 21 Dec 2001 13:34:34 -0500
Hi All,
Over the past couple of days I've been experimenting with a variant of DAS,
and have implemented it on the GMOD Generic Genome Browser. I'm calling it
the DAS/0.5 protocol, since it represents a deevolutionary step from DAS/1
and DAS/2 .
You can see what I've been working on at
http://www.wormbase.org/db/seq/gbrowse
The user experience is as follows:
1) she browses her favorite region of the genome
2) she decides to annotate genbank entry AB12345
3) she creates a space or tab-delimited text file that looks like this:
reference = AB12345.1
Gene "Predicted gene 1" 518-616,661-735,3187-3365 "Zinc-finger domain"
Gene "Predicted gene 2" 5513-6497,7968-8136,8278-8383 "Unknown"
Gene "Predicted gene 3" 16626-17396,17451-17597 "7-transmembrane"
4) she hits the "upload button". It appears on the display.
5) she decides to annotate another region of the genome. She hits the "edit"
button, and is brought to a text area that displays the uploaded file.
She adds another "reference=" line and more features.
6) When she saves, the new annotations are displayed
7) She customizes the display by configuring it to use her favorite glyphs,
colors, etc. She adds URLs to link to when the features are clicked.
This all gets saved into the uploaded file.
8) She decides to publish her annotations to her colleagues so she:
a) presses the download button to get the current copy of the annotation
file
b) puts the annotation file on her departmental web server
c) sends the URL to her friends -or-
d) uses the "bookmark" function to create a ready-made bookmark
of the entire view, including the annotations
9) Her friends then:
a) type her URL into the "Enter remote annotation" box
b) add other URLs to display in the same view
10) viola! all the entered URLs are rendered and displayed, along with
backlinks to their annotators.
11) settings are stored in a cookie, so the next time the user comes back,
all her uploaded data and external URLs are restored.
(if you want to test this out, select the C. elegans data source, and type in
the URL http://stein.cshl.org/~lstein/test2.txt". After the URL is loaded, a
list of the annotated regions will appear next to its name; click on the
appropriate link to be taken to an annotated region).
Internally, the way this works is:
1) at the server side, the annotation data, whether it is a URL or an
uploaded file, is mapped from relative coordinates into
absolute coordinates, using the current assembly.
2) this gets stored into a DAS database (using LDAS & Bio::DB::GFF)
3) every time the URL is requested, the server issues an If-Modified-Since
request, and updates its database if the source file has changed.
4) coordinates are recalculated when the underlying map changes.
The advantage of this is that the complex coordinate mapping occurs at the
server side, and allows for hard stuff like remapping between map assembly
changes. There's no longer a requirement for the annotator to set up a DAS
server; she just have to have a web server.
It also allows very arbitrary selection of reference points. For example, on
the WormBase server, the coordinate reference points can include:
1) chromosomes
2) contigs
3) clones
4) predicted genes
5) oligos
6) mapped genetic loci
7) single nucleotide polymorphisms
One big disadvantage is that the protocol forces the entire annotation
database to be swallowed and remapped at one gulp. Obviously this doesn't
scale well, so it only works for small-scale annotations. One way to work
around this would be to have the annotator store annotations on a reference
point in an individual file named after a reference point that contains all
the features being annotated, for example:
http://favorite_site/annotations/AB12345.1.txt
When viewing a region, the server could probe the annotation web site for
URLs matching reference points that overlap the region of interest:
http://favorite_site/annotations/Chr3.txt
http://favorite_site/annotations/Contig_a.txt
http://favorite_site/annotations/AB12345.1
Of course, this is totally clunky.
Another big disadvantage is that there's no formal way of versioning
reference points or annotations.
So I'm not proposing this as a replacement for DAS/1->DAS/2 development, but
rather as an alternative architectural style that addresses the same use
cases that DAS/1 does, and does it in a much more lightweight fashion.
Thoughts?
Lincoln
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================