[Biojava-l] Remote Locations

Matthew Pocock mrp@sanger.ac.uk
Mon, 05 Feb 2001 19:57:39 +0000


Thanks Greg for your stearling work on the IO package.

Fecking embl.

The best way to handle this with the BioJava object model is to build 
the troublesome feature on an assembly of the two embl entries (all the 
other features would be auto-magicaly projected into assembly 
coordinates as well if required). However, this requrires both entries 
to be available. Can we punt this for 1.1 (throw an exception), or is it 
a necisary feature that 99% of people parsing embl files need?

(feedback from list - please)

It is easy enough to come up with scheims where-by the full join feature 
implements a RemoteFeature interface, internaly stores that one of the 
ranges is remote, and also stores a reference to the remote ID, and a 
database. When the user actually attempts to access the feature, they 
can check for the remote case and call some sort of getRemoteFeature() 
method that returns the full feature in the assembly - the other 
sequence can be fetched silently from the db by id.

My gut tells me that Location is the wrong class to represent this mess 
- the problem is to Semanticaly rich (as the issues with equals show) 
for Location, so is more suited to Feature.

All thoughts greatfully recieved. As Greg said, you are the people that 
will have to maintain and use this code.

Matthew

Cox, Greg wrote:

> I plugged some new data into the genbank and embl parsers, and there's a
> slight problem.  A location like "join(L41624.1:2858..5660,1..419)" is valid
> and refers to a different sequence, L41624.  I've coded up a new location
> type, RemoteLocation to handle this case, but I want some feedback before
> committing it.  
> 
> 	 I've attached my code, but the big problem I see is that
> RemoteLocation implements Location, and contains a Location.  I've dealt
> with this recursive inheritance before and not enjoyed the experience.  The
> other option, inheriting from a concrete location, begs the question of
> which one.  
> 
> 	The other problem I see is that if there are two locations, both
> from 1..100 on different sequences, calling
> remoteLocation.equals(localLocation) will return false because
> remoteLocation knows to check if the parameter is on the right sequence.
> But localLocation.equals(remoteLocation) will return true because when
> localLocation was coded, remoteLocations didn't exist.  The hack around this
> is to do an instanceof for RemoteLocation, but I hope there's a better way.
> 
> 
> 	Any input is appreciated.  Remember, this is code YOU have to
> maintain!
> 
> Greg Cox
> 
>  <<RemoteLocation.java>> 
>  <<RemoteLocation.java>> 
> RemoteLocation.java
> 
> Content-Type:
> 
> application/octet-stream
> Content-Encoding:
> 
> quoted-printable
> 
> 
> ------------------------------------------------------------------------
> RemoteLocation.java
> 
> Content-Type:
> 
> application/octet-stream
> Content-Encoding:
> 
> quoted-printable