[Biojava-dev] Location Unions

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 9 Oct 2002 09:18:31 +1300


> > Ie this list of locations:
> > 
> > [1,20] [20,50] [45,90] forms the location {[1,20], [20,90]}.
> 
> Is that true?  I'd expect [1,90] (remember, right now we're 
> numbering list elements rather than `gaps', so [1,20] and 
> [20,50] definitely
> overlap.)
> 

Actually, that's a bad example (your right I'm wrong). However it still
demonstrates the point that the data is merged.


> > One problem with this is that it doesn't allow the recovery of the 
> > individual component locations through the block iterator. 
> Would it be 
> > better if the union gave the following
> > 
> > {[1,20],[20,50],[45,90]} I believe that such a beast would still 
> > behave the same as the location above (although slightly 
> heavier) and 
> > give the benefit of locating its sub components. Any 
> thoughts? Would 
> > this be horribly inefficient?
> 
> In general, it shouldn't be too bad, although we'd have to 
> explicitly change the contract of the current blockIterator 
> (which I believe is documented to return non-overlapping spans).
> 
> Before doing this, I'd like to know how you're planning to 
> use the `locations with internal structure'.  Currently, the 
> semantics of a plain Location are purely those of a set of 
> integers (which is actually a really useful thing to have 
> implemented efficiently anyway).  If you want to represent 
> collections of `interesting' objects (possibly overlapping), 
> you should be using nested features. Is there a case where 
> this breaks?

The problem with features is that I don't have the sequence, only a set
of locations. I could make a dummy sequence but it would be a very heavy
weight way of cracking this nut. I kind of like the idea that you can
play with locations independently of sequences. My current understanding
is that Features can only be created on FeatureHolders like Sequences.
Locations are not FeatureHolders. Features are also kind of heavyweight.
Lots of feature.templates and stuff that aren't really needed to
represent this problem (finding overlapping genes).

I think my second proposal of subclassing RangeLocation might be the
nicer solution. It would encapsulate the 'inner' locations, provide
accessor methods (if you want to see them) but not change how the block
iterator behaves.

- Mark
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================