[Biojava-l] GappedSymbolList behaviour is wierd, bug ?

Kalle Näslund Kalle.Naslund@genpat.uu.se
Wed, 12 Dec 2001 14:05:22 +0100


HI!

I am writing some small app that uses GappedSymbolList and i see some 
wierd behaviour.

The first "problem" is when i have a GappedSymbolList and i insert a gap 
into the View sequence ( the one that shows  gaps ). As long as i insert 
a gap/ gaps at a positin where there isnt any gap, all is fine. On the 
other hand, if i inserta  gap at a position where there is a gap, the 
gap gets inserted into the NEXT block of gaps, and if there isnt any 
next block of gaps, the gap gets appended at the end of the sequence. A 
simple text example will describe this much better. the example basicly 
just inserts a gap at position 3 in the view, a couple of time, and then 
prints the output, and it looks like this:

aattggcc        Initial sequence
aa-ttggcc       1 gap inserted at position 3
aa-ttggcc-      1 additional gap inserted at position 3
aa-ttggcc--     1 additional gap inserted at position 3
aa-ttggcc---    1 additional gap inserted at position 3

for me, this is not the way i think anyone would expect it to work. I 
think most people would just expect that gap insertion should work the 
same, irrespectively of what symbol is at the position where the gap 
gets inserted. And that the end result should look like this.     

aa----ttggcc



The second ting i am having some thoughts about is the viewToSource 
function, if you try to convert from view to source coordinates, and the 
view coordinate contains a gap, you get a return value of -1. The 
JavaDoc dont mention anything about what happens when you try to go from 
view to source coordinates and the view coordinate contains a gap, but 
it returns a -1 and that is ok i guess. But, this gives me lots of 
problems, as i have users graphicly specify an intervall on the 
GapedSequenceList, and i then want the source coordinates. If the user 
chooses one endpoint that is a gap, i will have to start scaning symbol 
for symbol, in the View coordinates, and then use the first non gap 
symbol.So would it be wrong, to change the viewToSource method to not 
return -1, but to actualy return the source position where the gap is 
inserted, multiplied by -1 ? This would most likely dont break any code 
that just checks if viewToSource returns -1 as most people will have 
done it like if( x < 0 ) and not like if( x == -1 ). And then you can 
get a meaningfull conversoin from view to source, and if you dont care, 
you can only chec if the return value is negaitve.

to clarify what i mean, i will give a short eample here aswell.

aa---ttggcc

as it is now, viewToSoruce( 4 ) will return -1, i would propose that it 
should return -3 instead, because it is at position three in the source 
sequence, the gaps are inserted. And the value shold be negative, to 
indicate that there is no direct link between the view position and the 
source, as the view is a gap.

I do understand that there might be things this little proposal does to 
other parts, that are not wanted, and therefore, this should only be 
seen as a little question / proposal, and nothing more, if there is a 
reason to only return -1 and nothing else, i will just do the dirty 
solution of walking along the view sequence until i find a non gap symbol.

Anyway, i have tested this on linux ( jdk 1.3.1 from sun ) and windows ( 
jdk 1.4.0b3 ), using both the binary biojava-20010920.jar release aswell 
as one of the older releases, and the behaviour is the same in all 
combinations.


to finnish this off, i would just like to say thanks to all who have 
contributed to biojava as it simplifies many nasty tasks a lot.

Sincerely, Kalle Näslund