[DAS] Restricting the range of an alignment query.

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Thu Aug 12 08:25:33 UTC 2010


Hi Thomas,

Unfortunately the existing use of the subject parameter was how it was designed as part of the eFamily project, very much focussed on proteins and with the alignments being their own semi-reference data type. Rob or Andreas can say more. The command is used in Pfam's alignment viewer and as mappings for SPICE, both using the full width of the alignment. The principal change between 1.53E and 1.6 is the 'cols' parameter, which at least makes it possible to retrieve sections of an alignment, but as you say this would be in the alignment's own coordinates (there's no other way to do it - which of the many sequences' coordinates would it use?). At the moment genomic alignments are purely theoretical, but 'cols' support is there in ProServer and MyDas.

As regards to what would be necessary to do what you want, I don't think we can change the subject parameter unless the existing alignment servers and clients using it can be changed, i.e. Pfam (not sure if there are others). I can't really think of another way to do it off the top of my head - the 'query' parameter has space for start/end positions and can be a sequence identifier, but this is more like 'get all alignments containing this sequence' which is not quite the same thing. It would also be a bit clunky to describe - what would "?query=alignment42:30,40" do?

Any suggestions?

Cheers,
Andy

On 11 Aug 2010, at 19:37, Thomas Down wrote:

> Hi,
> 
> I'm currently looking at the spec for the DAS1.6 alignment command:
> 
>           http://www.ebi.ac.uk/~aj/1.6_draft6/documents/spec.html#alignment
> 
> Suppose I'm interested in a relatively small interval of a large sequence
> (ex. human NCBI36 chr22:30000000,30200000), and want to find the orthologous
> segments of the mouse genome...  I can certainly do:
> 
>                .../das/align-oracle/alignments?query=xyzzy;subject=22
> 
> But that's potentially going to return a slew of data which could swamp
> lightweight clients and all but the best network connections!
> 
> One potential solution would be to use the "cols=" filter.  However, my
> reading of the spec is that this is working in "alignment" coordinates,
> rather than sequence coordinates.  Great if you're writing a full-blown
> alignment viewer and want to do some lazy-fetching, but troublesome if you
> want to layer a limited about of alignment data onto a more conventional
> sequence display (or, for that matter, jump into a large alignment using a
> sequence feature -- for instance, a gene name -- as your reference point).
> 
> My preferred solution would be to use a "normal" DAS segment identifier.  So
> in my example above, I'd just query for subject?=chr22:30000000,30200000 and
> get the relevant alignment blocks back straight away.  However, the current
> spec seems to attach a *different* meaning for subject=X:y,z (specifically,
> return alignments for X plus y sequences before and z sequences after it in
> a big multiple sequence alignment).
> 
> Am I missing something here?
> 
>          Thomas.
> 
> 
> PS. Also, I'd be really interested to hear from anyone else who's used the
> alignment command for genome-genome alignments (or, indeed, anything bigger
> than a protein), so I can coordinate and make my implementation as close as
> possible to whatever's already out there.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das





More information about the DAS mailing list