[DAS2] Sequence retrieval proposal

Thomas Down td2 at sanger.ac.uk
Thu Dec 8 10:33:16 UTC 2005


On 7 Dec 2005, at 23:22, Andrew Dalke wrote:

>
>> 2. What do folks think about specifying a DAS2XML format for sequence
>>    requests (text/x-das-sequence+xml)? In addition to permitting an
>>    optional checksum attribute to address the above use case, it  
>> would
>>    add some consistency and flexibility to the spec, since at  
>> present,
>>    the default sequence response format is the only one that is  
>> not under
>>    our control (currently it's text/x-fasta).
>
> As a consumer of this sort of data, I don't want to write another
> parser.  It isn't just the parsing part - it's the effort of mapping
> to my program's data model.
>
> There's already a huge number of existing sequence file formats.
> What would another provide?  Are some of them already extensible?
>
> Several of those formats are designed and developed by people involved
> with DAS.  If it's important, extend GAME or GFF.

Do GAME or GFF have a sequence representation?  I thought they were  
both primarily feature-table formats (right now I'm having trouble  
finding the GAME documentation though...).

The problem I have with Fasta format (other than the tendency of many  
data-providers to over-load the header line) is that there's no  
explicit marker for the alphabet and encoding of sequence data.  This  
is pretty nasty for codebases like BioJava which want to present a  
richer view of sequence data than just a String.  I'd certainly be in  
favour of a nice XML format that made alphabet information explicit.   
The DAS 1.5 DASSEQUENCE document has a moltype attribute which  
supports this (at least the three most important cases, DNA/RNA/ 
Protein -- there's not a standards-compliant way to add other  
alphabets though).

I guess an alternative, more classically RESTful, way of doing things  
might be with MIME types:

        Content-Type: application/fasta; sequence-alphabet=DNA;  
sequence-encoding=IUPAC

I admit I'd prefer the XML though...


             Thomas.



More information about the DAS2 mailing list