[Biojava-dev] Near Matches

Osborne, John jko1 at cdc.gov
Thu Sep 11 14:20:26 EDT 2003


Hi,

I am looking for a way in Biojava to iterate quickly through a list of DNA
N-mers for sequences that
are almost an exact match, like 23 of 25 bases.  The mismatches can occur in
ANY position in a sequence.  Other than iterating through a SymbolList and
keeping track of the number of mismatches, is there a better (read faster)
way to do this?  I was thinking maybe the SuffixTree class, but since
sequence order is unimportant it doesn't see like the right tool for the
job.

Right now it is going to be a little bit ugly, since I am putting this into
a O(n^2) function with a big n...

 -John


More information about the biojava-dev mailing list