[DAS] Restricting the range of an alignment query.

Wed Aug 11 18:37:37 UTC 2010

Hi,

I'm currently looking at the spec for the DAS1.6 alignment command:

           http://www.ebi.ac.uk/~aj/1.6_draft6/documents/spec.html#alignment

Suppose I'm interested in a relatively small interval of a large sequence
(ex. human NCBI36 chr22:30000000,30200000), and want to find the orthologous
segments of the mouse genome...  I can certainly do:

                .../das/align-oracle/alignments?query=xyzzy;subject=22

But that's potentially going to return a slew of data which could swamp
lightweight clients and all but the best network connections!

One potential solution would be to use the "cols=" filter.  However, my
reading of the spec is that this is working in "alignment" coordinates,
rather than sequence coordinates.  Great if you're writing a full-blown
alignment viewer and want to do some lazy-fetching, but troublesome if you
want to layer a limited about of alignment data onto a more conventional
sequence display (or, for that matter, jump into a large alignment using a
sequence feature -- for instance, a gene name -- as your reference point).

My preferred solution would be to use a "normal" DAS segment identifier.  So
in my example above, I'd just query for subject?=chr22:30000000,30200000 and
get the relevant alignment blocks back straight away.  However, the current
spec seems to attach a *different* meaning for subject=X:y,z (specifically,
return alignments for X plus y sequences before and z sequences after it in
a big multiple sequence alignment).

Am I missing something here?

          Thomas.

PS. Also, I'd be really interested to hear from anyone else who's used the
alignment command for genome-genome alignments (or, indeed, anything bigger
than a protein), so I can coordinate and make my implementation as close as
possible to whatever's already out there.