[DAS] Restricting the range of an alignment query.

Thu Aug 12 13:57:19 UTC 2010

On Thu, Aug 12, 2010 at 2:15 PM, Javier Herrero <jherrero at ebi.ac.uk> wrote:

> Hi Thomas
>
>
> > Is there still any interest in this on the Ensembl side?  It's something
> > I'm going to be needing soon, too (my current chain-file-based server
> > doesn't handle all the cases I'm interested in).
>
> I guess the interest must come from "the other side". I am quite keen on
> providing alignments through DAS if people and/or DAS clients will use them
> and if that is not too heavy for our servers.

Well, I'm very keen to get comparative data into Dalliance (
http://www.biodalliance.org/human/ncbi36/) if you haven't seen it, and an
ensembl-compara DAS server would be substantially the best way to do that.

> You can imagine that things can
> go horribly wrong if one asked all 33-way EPO alignments on a chromosome at
> once. This can probably be controlled in the server.
>

That's an interesting general question.  Historically, DAS has gone more for
trusting clients to request "sensible" amounts of data (although personally,
I'd like to see a richer way of hinting to clients what "sensible" might
mean in a given context).

You could just forbid fetching alignments >1Mb or something.

Having said that, I actually think there *are* legitimate reasons for
fetching a whole chromosome worth of alignments.  Firstly for clients that
want to do something other than pure data visualisation.  Secondly, for a
client which wants to show a "karyotype" type view, with syntenic blocks
labeled, rather than a very detailed base-level alignment.  The first one
can probably only be addressed by chunky servers and/or responsible usage
patterns, but the second one could be handled quite nicely with a very small
extension to the current DAS protocol.  The current format encourages you to
represent the high-level structure of the alignment using BLOCK elements,
then fill in the fine base-level structure with CIGAR strings.  Given a
server that follows this pattern, all that's needed is a flag to omit the
CIGARs and you'd have pretty-much perfect data for use in a synteny browser.
 Given that the CIGAR is already optional, this ought to be pretty painless
to add.

> Another question is whether it is easy to fit genomic alignments into the
> current dasalignment structure. I am not sure how to interpret things like
> dbAccessionId, objectVersion, dbSource, etc for a genomic alignment. In
> other
> words, should protein and genomic alignments share the same query and
> response?

The response format actually fits pretty well as far as I can tell.  To my
mind:

           assembly name/version == dbSource (although this is kind-of
redundant with coordinate system...)
           chromosome name/number == objectVersion.

Concrete example (NB. experimental server, may move, change or disappear!):

           view-source:
http://www.derkholm.net:8080/das/hg18ToHg19/alignment?query=22

Because of the block/segment approach, it would be easy to generalize this
to >2 way alignments.

There may be a few minor additions that are worthwhile, but overall I'm
fairly certain this will work okay.  Right now, I'm much more concerned
about the query format.

                       Thomas.