[DAS] Restricting the range of an alignment query.
Thomas Down
thomas.a.down at gmail.com
Thu Aug 12 13:57:19 UTC 2010
On Thu, Aug 12, 2010 at 2:15 PM, Javier Herrero <jherrero at ebi.ac.uk> wrote:
> Hi Thomas
>
>
> > Is there still any interest in this on the Ensembl side? It's something
> > I'm going to be needing soon, too (my current chain-file-based server
> > doesn't handle all the cases I'm interested in).
>
> I guess the interest must come from "the other side". I am quite keen on
> providing alignments through DAS if people and/or DAS clients will use them
> and if that is not too heavy for our servers.
Well, I'm very keen to get comparative data into Dalliance (
http://www.biodalliance.org/human/ncbi36/) if you haven't seen it, and an
ensembl-compara DAS server would be substantially the best way to do that.
> You can imagine that things can
> go horribly wrong if one asked all 33-way EPO alignments on a chromosome at
> once. This can probably be controlled in the server.
>
That's an interesting general question. Historically, DAS has gone more for
trusting clients to request "sensible" amounts of data (although personally,
I'd like to see a richer way of hinting to clients what "sensible" might
mean in a given context).
You could just forbid fetching alignments >1Mb or something.
Having said that, I actually think there *are* legitimate reasons for
fetching a whole chromosome worth of alignments. Firstly for clients that
want to do something other than pure data visualisation. Secondly, for a
client which wants to show a "karyotype" type view, with syntenic blocks
labeled, rather than a very detailed base-level alignment. The first one
can probably only be addressed by chunky servers and/or responsible usage
patterns, but the second one could be handled quite nicely with a very small
extension to the current DAS protocol. The current format encourages you to
represent the high-level structure of the alignment using BLOCK elements,
then fill in the fine base-level structure with CIGAR strings. Given a
server that follows this pattern, all that's needed is a flag to omit the
CIGARs and you'd have pretty-much perfect data for use in a synteny browser.
Given that the CIGAR is already optional, this ought to be pretty painless
to add.
> Another question is whether it is easy to fit genomic alignments into the
> current dasalignment structure. I am not sure how to interpret things like
> dbAccessionId, objectVersion, dbSource, etc for a genomic alignment. In
> other
> words, should protein and genomic alignments share the same query and
> response?
The response format actually fits pretty well as far as I can tell. To my
mind:
assembly name/version == dbSource (although this is kind-of
redundant with coordinate system...)
chromosome name/number == objectVersion.
Concrete example (NB. experimental server, may move, change or disappear!):
view-source:
http://www.derkholm.net:8080/das/hg18ToHg19/alignment?query=22
Because of the block/segment approach, it would be easy to generalize this
to >2 way alignments.
There may be a few minor additions that are worthwhile, but overall I'm
fairly certain this will work okay. Right now, I'm much more concerned
about the query format.
Thomas.
More information about the DAS
mailing list