[Biojava-l] How to use SuffixTree?

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 25 Sep 2002 13:35:53 +0100


Hi,

The suffix tree code is quite old now, it could probably do with an 
overhaul. Given a SymbolList symL, you can see if it is in the tree by 
doing something like:

int countWord(SuffixTree suffTree, SymbolList symL) {
   SuffixTree.SuffixNode node = suffTree.getRoot();

   for(int i = 1; i <= symL.length(); i++) {
     int sym = suffTree.indexForSymbol(symL.symbolAt(i));
     if(node.hasChild(sym)) {
       node = suffTree.getChild(node, sym);
     } else {
       return 0;
     }
   }

   return (int) node.getNumber();
}

The suffix tree interfaces are a bit suckey. The indexing should be 
moved out to an AlphabetIndex delegate and the node/tree api slit is a 
bit silly. Also, the implementation for these trees gets too big too 
quickly. Mmm.

Matthew

hannah schmidt-glenewinkel wrote:
> I was happy to see that there is a SuffixTree-class in biojava...but now I'm
> just not sure how to use it.
> I think I understood the concept of a suffix tree in general: it holds
> references to all suffices of a given string, so that I can search for a pattern
> that may or may not occur in that String very fast.
> 
> So shouldn't the SuffixTree-class provide methods like:
> boolean doesOccur(String pattern)   or
> int occursAt(String pattern)
> 
> I'd just like to know what I can actually do with a SuffixTree once I
> created it.
> Thank you very much for any help!
> 
> Hannah
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk