[DAS] "DAS/0.5" -- a different way of doing it

Lincoln Stein lstein@cshl.org
Fri, 21 Dec 2001 13:34:34 -0500


Hi All,

Over the past couple of days I've been experimenting with a variant of DAS, 
and have implemented it on the GMOD Generic Genome Browser.  I'm calling it 
the DAS/0.5 protocol, since it represents a deevolutionary step from DAS/1 
and DAS/2 .

You can see what I've been working on at 
http://www.wormbase.org/db/seq/gbrowse

The user experience is as follows:

	1) she browses her favorite region of the genome

	2) she decides to annotate genbank entry AB12345

	3) she creates a space or tab-delimited text file that looks like this:

	reference = AB12345.1
	Gene  "Predicted gene 1"      518-616,661-735,3187-3365  "Zinc-finger domain"
	Gene  "Predicted gene 2"      5513-6497,7968-8136,8278-8383   "Unknown"
	Gene  "Predicted gene 3"      16626-17396,17451-17597 "7-transmembrane"

	4) she hits the "upload button".  It appears on the display.

	5) she decides to annotate another region of the genome.  She hits the "edit"
		button, and is brought to a text area that displays the uploaded file.
		She adds another "reference=" line and more features.

	6) When she saves, the new annotations are displayed

	7) She customizes the display by configuring it to use her favorite glyphs,
		colors, etc.  She adds URLs to link to when the features are clicked.
		This all gets saved into the uploaded file.

	8) She decides to publish her annotations to her colleagues so she:
		a) presses the download button to get the current copy of the annotation
			file
		b) puts the annotation file on her departmental web server
		c) sends the URL to her friends -or-
		d) uses the "bookmark" function to create a ready-made bookmark
			of the entire view, including the annotations

	9) Her friends then:
		a) type her URL into the "Enter remote annotation" box
		b) add other URLs to display in the same view

	10) viola!  all the entered URLs are rendered and displayed, along with 	
		backlinks to their annotators.

	11) settings are stored in a cookie, so the next time the user comes back,
		all her uploaded data and external URLs are restored.

(if you want to test this out, select the C. elegans data source, and type in 
the URL http://stein.cshl.org/~lstein/test2.txt".  After the URL is loaded, a 
list of the annotated regions will appear next to its name; click on the 
appropriate link to be taken to an annotated region).

Internally, the way this works is:

	1) at the server side, the annotation data, whether it is a URL or an
		uploaded file, is mapped from relative coordinates into 
		absolute coordinates, using the current assembly.

	2) this gets stored into a DAS database (using LDAS & Bio::DB::GFF)

	3) every time the URL is requested, the server issues an If-Modified-Since
		request, and updates its database if the source file has changed.

	4) coordinates are recalculated when the underlying map changes.

The advantage of this is that the complex coordinate mapping occurs at the 
server side, and allows for hard stuff like remapping between map assembly 
changes.  There's no longer a requirement for the annotator to set up a DAS 
server; she just have to have a web server.  

It also allows very arbitrary selection of reference points.  For example, on 
the WormBase server, the coordinate reference points can include:

	1) chromosomes
	2) contigs
	3) clones
	4) predicted genes
	5) oligos
	6) mapped genetic loci
	7) single nucleotide polymorphisms

One big disadvantage is that the protocol forces the entire annotation 
database to be swallowed and remapped at one gulp.  Obviously this doesn't 
scale well, so it only works for small-scale annotations.  One way to work 
around this would be to have the annotator store annotations on a reference 
point in an individual file named after a reference point that contains all 
the features being annotated, for example:  
http://favorite_site/annotations/AB12345.1.txt

When viewing a region, the server could probe the annotation web site for 
URLs matching reference points that overlap the region of interest:

	http://favorite_site/annotations/Chr3.txt
	http://favorite_site/annotations/Contig_a.txt
	http://favorite_site/annotations/AB12345.1

Of course, this is totally clunky.

Another big disadvantage is that there's no formal way of versioning 
reference points or annotations.

So I'm not proposing this as a replacement for DAS/1->DAS/2 development, but 
rather as an alternative architectural style that addresses the same use 
cases that DAS/1 does, and does it in a much more lightweight fashion.

Thoughts?

Lincoln
	
-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================