[DAS] Restricting the range of an alignment query.

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Thu Aug 12 09:41:54 UTC 2010


(ccing list, comments inline)
On 12 Aug 2010, at 09:49, Thomas Down wrote:
> 
> On Thu, Aug 12, 2010 at 9:25 AM, Andy Jenkinson <andy.jenkinson at ebi.ac.uk> wrote:
> Hi Thomas,
> 
> Unfortunately the existing use of the subject parameter was how it was designed as part of the eFamily project, very much focussed on proteins and with the alignments being their own semi-reference data type. Rob or Andreas can say more. The command is used in Pfam's alignment viewer and as mappings for SPICE, both using the full width of the alignment. The principal change between 1.53E and 1.6 is the 'cols' parameter, which at least makes it possible to retrieve sections of an alignment, but as you say this would be in the alignment's own coordinates (there's no other way to do it - which of the many sequences' coordinates would it use?). At the moment genomic alignments are purely theoretical, but 'cols' support is there in ProServer and MyDas.
> 
> Okay, I see your point (and I'm certainly not proposing we break existing stuff), but not entirely true -- I'm running genome-genome alignment servers on my dev. machine and will be pushing at least a couple of them public once this issue is resolved.  The next version of Dalliance will support alignment DAS (initially just for coordinate system mapping, but want to do a proper comparative genomics view in the future which will hopefully use much the same DAS code).  Again, that's 90% working now and at this point mostly just waiting until I'm sure I'm using the alignment system properly, or at least abusing it in a somewhat-sensible manner.

OK so you're a lot further along, sounds good. A while back we were aiming to get compara alignments as DAS too (which necessitated the cols parameter).

> As regards to what would be necessary to do what you want, I don't think we can change the subject parameter unless the existing alignment servers and clients using it can be changed, i.e. Pfam (not sure if there are others). I can't really think of another way to do it off the top of my head - the 'query' parameter has space for start/end positions and can be a sequence identifier, but this is more like 'get all alignments containing this sequence' which is not quite the same thing. It would also be a bit clunky to describe - what would "?query=alignment42:30,40" do?
> 
> Any suggestions?
> 
> Well, the obvious thing is to couple the coordinate restrictions to the sequence to which they apply.
> 
> Simplest solution I can think of would be to add:
> 
>            ?segment=seqName[:start,end]
> 
> ...where:
> 
>            ?segment=P12345
> 
> ...is synonymous with:
> 
>            ?subject=P12345        (which would still be supported),
> 
> ...but...
> 
>            ?segment=22:30000000,30200000
> 
> ...does what I want.   Maybe not the cleanest solution, but I don't think it's going to horribly break anything (unless there are subtleties I'm missing here?) 
> 
>                      Thomas.

I think you're right in that it is going to need another parameter to make it work. Any objections from anyone? What would happen if you specified multiple segments which did not correspond to the same section? Return multiple blocks representing multiple horizontal sections?



More information about the DAS mailing list