[Biojava-dev] Location Unions

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 9 Oct 2002 13:18:58 +1300


OK, I admit it might be overkill but I have added the MergeLocations
functionality. I just couldn't bare to think of all those assimilated
locations loosing their identity (Oh, the Humanity!).

It doesn't seem to add much overhead and passes all the tests too!!

- Mark

> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk] 
> Sent: Wednesday, 9 October 2002 11:20 a.m.
> To: Schreiber, Mark
> Cc: Thomas Down; biojava-dev@biojava.org
> Subject: Re: [Biojava-dev] Location Unions
> 
> 
> On Wed, Oct 09, 2002 at 09:18:31AM +1300, Schreiber, Mark wrote:
> >
> > > Before doing this, I'd like to know how you're planning to
> > > use the `locations with internal structure'.  Currently, the 
> > > semantics of a plain Location are purely those of a set of 
> > > integers (which is actually a really useful thing to have 
> > > implemented efficiently anyway).  If you want to represent 
> > > collections of `interesting' objects (possibly overlapping), 
> > > you should be using nested features. Is there a case where 
> > > this breaks?
> > 
> > The problem with features is that I don't have the sequence, only a 
> > set of locations. I could make a dummy sequence but it 
> would be a very 
> > heavy weight way of cracking this nut. I kind of like the idea that 
> > you can play with locations independently of sequences. My current 
> > understanding is that Features can only be created on 
> FeatureHolders 
> > like Sequences. Locations are not FeatureHolders. Features are also 
> > kind of heavyweight. Lots of feature.templates and stuff 
> that aren't 
> > really needed to represent this problem (finding overlapping genes).
> 
> Yes, DummySequence is probably how I would deal with this. 
> Features aren't really /that/ heavyweight, unless you add 
> really large numbers of annotation properties (note: you 
> normally want to use SmallAnnotation rather than 
> SimpleAnnotation, since SimpleAnnotation has quite a bit of 
> overhead from an internal HashMap).  You should be able to 
> have many thousands of Features in memory without any trouble 
> at all. Surely if you're looking for overlapping genes, you 
> really want to keep track of the IDs, and for this you really 
> need to use Features.
> 
> But if you're really keen to do this just using Locations,
> how about:
> 
>     - Put the Locations for all your genes into a List
>     - Sort it on Location.naturalOrder
>     - Traverse the list checking for overlaps between
>       neighbouring entries.
> 
> > I think my second proposal of subclassing RangeLocation 
> might be the 
> > nicer solution. It would encapsulate the 'inner' locations, provide 
> > accessor methods (if you want to see them) but not change how the 
> > block iterator behaves.
> 
> Yes, contract-wise that's much better.  But it still seems 
> over- complicated to me, and adding some meaning that 
> Locations were never meant to have.
> 
> Anyone else have views on this?
> 
>      Thomas.
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================