CircularLocations (was Re: [Biojava-l] Re: [Biojava-dev] Feature at position 0)

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Tue May 18 21:55:56 EDT 2004


Hello -

As you probably have seen CircularLocations are an Ugly Hack. I can say 
that cause I wrote them! They have been through many iterations and have 
some fundamental problems. Point 1 you raise below is more of a difference 
from what you might expect than a bug. Point 2 is a bug.

I'm happy to change both but it might take me a while to get around to it. 
If you want to have a hack yourself be my guest. Fundametally the 
seperation of strand and location in the biojava API doesn't suit circular 
locations very well. Hopefully this is fixed in BJ2 (I haven't checked the 
API yet).

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910





Bradford Powell <bcpowell at email.unc.edu>
Sent by: biojava-l-bounces at portal.open-bio.org
05/14/2004 09:42 PM

 
        To:     biojava-l at biojava.org
        cc: 
        Subject:        CircularLocations (was Re: [Biojava-l] Re: [Biojava-dev] Feature at 
position 0)


On Fri, 14 May 2004, Thomas Down wrote:

> 
> On 14 May 2004, at 11:21, Matthew Pocock wrote:
> 
> > I think for the sake of useability, we could either relax the location 

> > constraint to allow point locations at 0, or in the sp parser, 
> > re-write these as <1 - what do people think?
> 
> Ugh.
> 
> I'm not sure "<1" will appeal to people who are into round-tripping.
> 
> I think my (slightly) favoured option would be to remove the location 
> constraints on Features completely.  I know this is pretty horrible, 
> but off-sequence locations do seem to be things people use.
> 

This brings up another issue that I have been thinking about recently. I'm
not really comfortable with how circular sequences and circular
locations are handled in biojava. For those who aren't familiar,
CircularLocations are mapped as CompoundLocations.

I would prefer for features on a circular sequence (a CircularView of a
sequence) to have coordinates that are outside of the sequence
coordinates. Usually I use the convention that the larger coordinate is
within 1..length. This makes it easy to check for features that overlap
the origin (their min values are <= 0).

While I'm thinking of it, there are a couple of bugs I've seen in
CircularView:

1--
If subList or subString are called with start and end values that
should produce a list longer than the original sequence, things don't
happen as I would expect. Suppose you have a sequence 'seq':
                 CircularView cv = new CircularView(seq);
                 SymbolList subL = cv.subList(1, seq.length() + 3)
subL.length() has the value 3 instead of seq.length()+3 (i.e. it holds
just the first three symbols of seq because the start and stop coordinates
are immediately translated to be within 1..length() upon entry to
subList().

I can see two ways to resolve this-- one would be to check to see if
start-stop > length() and add appropriate numbers of copies of the source
symbolList to the sublist as seen above. The other way would be to use a
WrappedSymbolListView (this is what I have done for my purposes, code
available if people think this would be a good idea).

2--
I almost hesitate to report this "bug" because I like to use negative
coordinates in locations on circular sequences-- createFeature() in
CircularView throws an exception if getMax() > length() but not if
getMin() <= 0. I guess it is a bug one way or the other because it is
inconsistant. Since the topic of removing location constraints for
features came up, I would say that it would at least make sense to remove
the restrictions for circular sequences.

Whew, that message was longer than I thought it would be.

-- Bradford Powell


_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list