[Biojava-l] removeGap problem with SimpleGappedSequence

Matthew Pocock matthew_pocock at yahoo.co.uk
Thu Feb 12 06:02:39 EST 2004


Hi,

Seems like we have a bit of an 'expected behavior' and 'implemented 
behavior' gap. If we decide to modify the GappedSymbolList constructor 
to find all gaps in the original sequence, I think we should add it as 
an option:

  new GappedSymbolList(origSyms, mergeOriginalGaps)

and make the current constructor equivalent to this(syms, false). 
Finding all these gaps, making an ungapped underlying symbol list, and 
building the gap insertion data structures is a potentialy expensive 
operation (imagine gapping a genome! you would pull the whole thing into 
memory and do a linear scan), so we should be careful not to force it 
upon the world.

This would also change the contract of getSourceSymbolList() and also 
what happens if that source is modified, wether changes to it are tracked.

This could be worked around by implementing an "UnGappedView" class that 
does the oposite mapping of GappedSymbolList - removes all gaps in the 
source - then we could gap this putting them all back, making it 
editable. I don't wan't to be the one to write it though - writing 
GappedSymbolList made my brain hurt.

Matthew

mark.schreiber at group.novartis.com wrote:

>Sounds like a pretty sensible suggestion. Can anyone think of why this 
>might not be a 'good idea'?
>
>If not, i'll add it to the list of things to fix :)
>
>- Mark
>  
>



More information about the Biojava-l mailing list