[Biojava-dev] Location Unions

Thomas Down td2@sanger.ac.uk
Tue, 8 Oct 2002 07:44:44 +0100


On Tue, Oct 08, 2002 at 11:27:19AM +1300, Schreiber, Mark wrote:
> 
> Currently location unions are performed by merging contiguous blocks and
> making compound locations where nescessary.
> 
> Ie this list of locations:
> 
> [1,20] [20,50] [45,90] forms the location {[1,20], [20,90]}.

Is that true?  I'd expect [1,90] (remember, right now we're numbering
list elements rather than `gaps', so [1,20] and [20,50] definitely
overlap.)

> One problem with this is that it doesn't allow the recovery of the
> individual component locations through the block iterator. Would it be
> better if the union gave the following
> 
> {[1,20],[20,50],[45,90]} I believe that such a beast would still behave
> the same as the location above (although slightly heavier) and give the
> benefit of locating its sub components. Any thoughts? Would this be
> horribly inefficient?

In general, it shouldn't be too bad, although we'd have to explicitly
change the contract of the current blockIterator (which I believe
is documented to return non-overlapping spans).

Before doing this, I'd like to know how you're planning to use
the `locations with internal structure'.  Currently, the semantics
of a plain Location are purely those of a set of integers (which
is actually a really useful thing to have implemented efficiently
anyway).  If you want to represent collections of `interesting'
objects (possibly overlapping), you should be using nested features.
Is there a case where this breaks?

        Thomas.