[Biojava-l] ssaha

Matthew Pocock matthew_pocock at yahoo.co.uk
Wed Apr 23 12:52:29 EDT 2003


 --- "Schreiber, Mark"
<mark.schreiber at agresearch.co.nz> wrote: > Hi -
>  
> I have been using the ssaha package in biojava and
> it's really great, good work guys!!

Cool. That got written at conferences - Touscon
hackathon and ISMB in Edmonton. Laptops & conferences
rock. Oh, and Thomas made it actualy work :)

>  
> I have one minor question, Am I correct in assuming
> that hits are reported only if a segment from the
> query sequence exactly matches a word in the
> seqstore? Does ssaha allow for partial matches?

Yes - SSAHA is fast /because/ it only does exact
matches. If you are willing to peak inside the code,
you will find that there is a table of hits. The table
is indexed by an integer generated by packing the
search words as binary into an int (or long). You
could do some bit-flipping in here to generate all
numbers for search words differing by n symbols. But,
that's getting a bit hard-core.

>  
> - Mark
>  

If you use a smaller word-size and then throw away all
hits shorter than some pre-arranged threshold do you
still get reasonable results? The hash-table may get
too big though.

Matthew

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer


More information about the Biojava-l mailing list