[Biojava-l] ssaha

Matthew Pocock matthew_pocock@yahoo.co.uk
Thu, 07 Mar 2002 13:35:25 +0000


Dear all,

We now have a src-1.4 directory in BioJava - curtosey of Thomas. If you 
are on a 1.4 compliant platform, this source tree will be built allong 
with src. If you are on an earlier platform, it will be silently ignored.

The first addition to this directory is an implementation of the SSAHA 
searching algorithm developed at the Sanger Centre. It currently doesn't 
scale (being bound by a 2GB limit on hash-table size, and I'm not sure 
that the NIO packages in the 1.4 release are bug-free). I will be 
working on it to ensure that it can handle the full 2^64 byte data 
tables available via the c++ implementations. The java and c++ hash 
tables are unlikely to be binary compatible. The java hash tables should 
be network portable, assuming that you move them as binary ;-)

If someone is keen, they could write a NIO-based socket server for the 
SSAHA search engine so that we could set up highly efficient 
client-server search services (should be able to handle 1000s of clients 
with NIO and a thread-pool). Also, it currently reports hits but not as 
collections of HSPs. There is the possibility of doing bounded 
alignments using SSAHA hits as anchor points. By replacing the Packing 
object, we could do a codon based SSAHA, a protein SSAHA, or any other 
funkey alphabet you can come up with. The rules for discarding frequent 
words are bad at the moment (absolute threshold), so this could be 
replaced with some nice histogram maths. I don't have the time to tidy 
all of this, but perhaps you do.

NIO rocks!

Have fun,

Matthew