[DAS] Restricting the range of an alignment query.
Thomas Down
thomas.a.down at gmail.com
Wed Aug 11 18:37:37 UTC 2010
Hi,
I'm currently looking at the spec for the DAS1.6 alignment command:
http://www.ebi.ac.uk/~aj/1.6_draft6/documents/spec.html#alignment
Suppose I'm interested in a relatively small interval of a large sequence
(ex. human NCBI36 chr22:30000000,30200000), and want to find the orthologous
segments of the mouse genome... I can certainly do:
.../das/align-oracle/alignments?query=xyzzy;subject=22
But that's potentially going to return a slew of data which could swamp
lightweight clients and all but the best network connections!
One potential solution would be to use the "cols=" filter. However, my
reading of the spec is that this is working in "alignment" coordinates,
rather than sequence coordinates. Great if you're writing a full-blown
alignment viewer and want to do some lazy-fetching, but troublesome if you
want to layer a limited about of alignment data onto a more conventional
sequence display (or, for that matter, jump into a large alignment using a
sequence feature -- for instance, a gene name -- as your reference point).
My preferred solution would be to use a "normal" DAS segment identifier. So
in my example above, I'd just query for subject?=chr22:30000000,30200000 and
get the relevant alignment blocks back straight away. However, the current
spec seems to attach a *different* meaning for subject=X:y,z (specifically,
return alignments for X plus y sequences before and z sequences after it in
a big multiple sequence alignment).
Am I missing something here?
Thomas.
PS. Also, I'd be really interested to hear from anyone else who's used the
alignment command for genome-genome alignments (or, indeed, anything bigger
than a protein), so I can coordinate and make my implementation as close as
possible to whatever's already out there.
More information about the DAS
mailing list