[DAS2] Sequence retrieval proposal

Steve Chervitz Steve_Chervitz at affymetrix.com
Mon Dec 12 21:33:00 UTC 2005

On Sun, 11 Dec 2005 Andrew Dalke wrote:
> Steve:
>> I am also somewhat loath to add yet another sequence file format to the
>> world. Seems reasonable to state that a DAS/2 server can supply
>> sequence in
>> an alternative format via requests such as:
>>   http://www.wormbase.org/das/genome/volvox/1/sequence?format=GAME
> That makes good sense to me.
>> Here's a brief tour of some possibly extensible candidates:
> Do you want to say this as:
>    "The server must implement these sequence formats"
> or
>    "If the server implements one or more of these sequence formats then
>      it must use the corresponding id and content-type."
> ?
> Or say nothing and wait until several different servers implement
> this then standardize on what they do?
> I don't think anyone here seriously wants the first. :)
> The last is my favorite, then the middle one.

The last is fine with me. This is the approach we use for type-specific
alternative feature formats:

> My stronger preference is to get a complete 2.0 spec out.  Do
> you or other users need checksum validation of the sequence and/or
> alternate sequence formats in 2.0?  What prevents you from extending
> existing HTTP headers or experimenting with extensions then
> submitting your experience for inclusion in future versions of
> the spec?
> My sense is that this can wait.

Yep. Especially in light of this morning's teleconf (notes for which are on
the way). This seems like a good place to invoke YAGNI (
http://keithdevens.com/quotes/YAGNI ).

>> We might consider proscribing some conventions for what DAS considers proper
>> fasta format. I put in a little bit of description of a DAS-acceptable fasta
>> format here in the retrieval spec:
>> http://biodas.org/documents/das2/das2_get.html#sequence
> Do current DAS clients even use the header?
> Will future ones use it?  If so, why?  Shouldn't all the information
> in a header be available as an annotation?

Don't know. Seems like it should be left to the client implementation to
decide what to do with the header. The aim of the sequence request (soon to
be 'residues') is to get sequence data, not annotations.

If we're not saying what DAS/2 clients are supposed to do with the header
info, and there are so many variations out there, we might consider stating
that clients are free to ignore the header. Then if we do this, why use
fasta format instead of raw sequence?

Btw, DAS/1 used an XML formatted response for sequence data. The DAS/1
sequence element has these attributes: id, start, stop, moltype, version.
Does anyone know how DAS/1 clients make use of these from the seq response?

> The wikiepedia entry for FASTA is pretty good.
>    http://en.wikipedia.org/wiki/Fasta_format

Interesting. That more-than-one-header business seems evil. They give a good
link for alternative sequence formats.


More information about the DAS2 mailing list