[Biojava-dev] Location Unions

Thomas Down td2@sanger.ac.uk
Tue, 8 Oct 2002 23:20:18 +0100


On Wed, Oct 09, 2002 at 09:18:31AM +1300, Schreiber, Mark wrote:
>
> > Before doing this, I'd like to know how you're planning to 
> > use the `locations with internal structure'.  Currently, the 
> > semantics of a plain Location are purely those of a set of 
> > integers (which is actually a really useful thing to have 
> > implemented efficiently anyway).  If you want to represent 
> > collections of `interesting' objects (possibly overlapping), 
> > you should be using nested features. Is there a case where 
> > this breaks?
> 
> The problem with features is that I don't have the sequence, only a set
> of locations. I could make a dummy sequence but it would be a very heavy
> weight way of cracking this nut. I kind of like the idea that you can
> play with locations independently of sequences. My current understanding
> is that Features can only be created on FeatureHolders like Sequences.
> Locations are not FeatureHolders. Features are also kind of heavyweight.
> Lots of feature.templates and stuff that aren't really needed to
> represent this problem (finding overlapping genes).

Yes, DummySequence is probably how I would deal with this.
Features aren't really /that/ heavyweight, unless you add
really large numbers of annotation properties (note: you
normally want to use SmallAnnotation rather than SimpleAnnotation,
since SimpleAnnotation has quite a bit of overhead from an
internal HashMap).  You should be able to have many thousands
of Features in memory without any trouble at all.
Surely if you're looking for overlapping genes, you really
want to keep track of the IDs, and for this you really
need to use Features.

But if you're really keen to do this just using Locations,
how about:

    - Put the Locations for all your genes into a List
    - Sort it on Location.naturalOrder
    - Traverse the list checking for overlaps between
      neighbouring entries.

> I think my second proposal of subclassing RangeLocation might be the
> nicer solution. It would encapsulate the 'inner' locations, provide
> accessor methods (if you want to see them) but not change how the block
> iterator behaves.

Yes, contract-wise that's much better.  But it still seems over-
complicated to me, and adding some meaning that Locations were
never meant to have.

Anyone else have views on this?

     Thomas.