[Biojava-dev] Near Matches

Schreiber, Mark mark.schreiber at agresearch.co.nz
Sat Sep 13 00:48:50 EDT 2003


Hi -
 
I'm pretty sure you could hack the KnuthMorrisPrattSearch class to allow it to tolerate a certain number of missmatches. KMP searches are very fast although they will slow considerabley if you allow more than a few missmatches.
 
- Mark

	-----Original Message----- 
	From: Matthew Pocock [mailto:matthew_pocock at yahoo.co.uk] 
	Sent: Fri 12/09/2003 9:07 p.m. 
	To: Schreiber, Mark 
	Cc: biojava-dev at biojava.org 
	Subject: Re: [Biojava-dev] Near Matches
	
	

	Hi John,
	
	I commited code that does a slightly different job - accpets or rejects
	regions based upon sequence content. However, it will be easy to write
	an aproximate matcher that does at most n miss-matches. The algorithms
	that sudgest themselves to me are all O(mn) - not sure if we can do
	better without a math genius, or a good string algorithms book.
	
	Matthew
	
	Schreiber, Mark wrote:
	
	>Hi -
	>
	>I think Matthew checked in something that does what you want a few days back. Have a look at the list archives, or cvs records for that last few weeks.
	>
	>- Mark
	>
	>
	>-----Original Message-----
	>From: Osborne, John [mailto:jko1 at cdc.gov]
	>Sent: Friday, 12 September 2003 6:20 a.m.
	>To: biojava-dev at biojava.org
	>Subject: [Biojava-dev] Near Matches
	>
	>
	>Hi,
	>
	>I am looking for a way in Biojava to iterate quickly through a list of DNA N-mers for sequences that are almost an exact match, like 23 of 25 bases.  The mismatches can occur in ANY position in a sequence.  Other than iterating through a SymbolList and keeping track of the number of mismatches, is there a better (read faster) way to do this?  I was thinking maybe the SuffixTree class, but since sequence order is unimportant it doesn't see like the right tool for the job.
	>
	>Right now it is going to be a little bit ugly, since I am putting this into a O(n^2) function with a big n...
	>
	> -John
	>_______________________________________________
	>biojava-dev mailing list
	>biojava-dev at biojava.org http://biojava.org/mailman/listinfo/biojava-dev
	>=======================================================================
	>Attention: The information contained in this message and/or attachments
	>from AgResearch Limited is intended only for the persons or entities
	>to which it is addressed and may contain confidential and/or privileged
	>material. Any review, retransmission, dissemination or other use of, or
	>taking of any action in reliance upon, this information by persons or
	>entities other than the intended recipients is prohibited by AgResearch
	>Limited. If you have received this message in error, please notify the
	>sender immediately.
	>=======================================================================
	>
	>_______________________________________________
	>biojava-dev mailing list
	>biojava-dev at biojava.org
	>http://biojava.org/mailman/listinfo/biojava-dev
	>
	> 
	>
	
	
	_______________________________________________
	biojava-dev mailing list
	biojava-dev at biojava.org
	http://biojava.org/mailman/listinfo/biojava-dev
	


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the biojava-dev mailing list