[Biojava-l] SSAHAj

Matthew Pocock matthew_pocock@yahoo.co.uk
Fri, 02 Aug 2002 18:18:55 +0100


Hello everyone,

Incase you don't know, SSAHA is a fast algorithm for searching sequences 
against a database by converting the sequences and database into 
bit-strings and then using shifts and equals to find matches. Take a 
look at the sanger software page to find the papers & c/c++ implementations.

I'm about to delve back into the ssaha implementation in BioJava. At the 
  moment it uses the nio buffer mapping code to mount the ssaha 
hashtable. This is problamatic because in their infinite wisdom, the 
architects of nio used integer indecies. We need longs when using 
genomic sized data sets. The nio channel API lets us deal with long 
offsets, but is a little more tricky to use. On the other hand, it lets 
us use files of sizes up to Long.MAX_VALUE, which is probably big enough 
for searching embl ;-)

Anyway, I don't know if anybody is using SSAHAj, and I hope it will be 
binary compatible once I'm finnished with it, but I thought it'd be 
polite to tell you before I potentialy break anything. At the worst, you 
will need to re-build your hash tables.

Matthew


__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com