[Biojava-l] Re: Biojava-l digest, Vol 1 #44 - 2 msgs

Matthew Pocock mrp@sanger.ac.uk
Mon, 20 Mar 2000 11:38:34 +0000


Aaron,

For those who are interested, GappedResidueList stores a list of ungapped aligned blocks, giving the start-end coordinates in the underlying sequence, and the start-end in the gapped view (this should probably change to two starts and a single length). If a residueAt request is within an aligned block I just flip the coordinates from view to source and get the underlying residue. If it is between blocks then it is a gap, so I return the gap residue. The apropreate block (or gap) can be efficiently found using a binary search. Inserting and removing gaps usualy just causes the view
indecies to be updated. Occasionaly a delete joins two alignmed blocks together, in which case they are merged. Sometimes  a gap insertion breaks a block into two, so I create a new block and insert it into the blocks list.

The implementation is not exposed in the API, so if we can agree on the gap-edit opperations, then there is no reason not to make GappedResidueList an interface. Java is pants at string-manipulation (can be extreemly slow on some systems). Mabey we should have a format object that converts between the SeqStoor string system and a java object - in a class called something like org.biojava.bio.programs.GCG.formats?

Aaron, do you have read/write access? We can set you up if you don't.

Matthew

Aaron Kitzmiller wrote:

> I haven't seen this (org.biojava.bio.seq.GappedResidueList ) on the JavaDocs yet, so this may be a moot point, but I'm curious about how you implemented this and if you've created a GappedResidueList interface.  The reason I ask is that I've been investigating the use of GCGs SeqStore for the storage of a number of things, including alignments.  Their implementation uses a gap vector that stores offsets and gap sizes in a single text line.  If you've built this with an interface, I should be able to create a SeqStore-specific implementation that will work with the rest of the code.
>
> Aaron K.
>
> Aaron Kitzmiller
> Genetics Institute
> 35 Cambridge Park Dr.
> Cambridge, MA 02140
> Phone: (617) 665-6831
> Fax: (617) 665-8870
> akitzmiller@genetics.com
>