[Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols

Jesse jesse-t at chello.nl
Mon Jun 6 06:08:03 EDT 2005


Hi Cor,

Thanks for your reply.

I corrected the pattern by doing the following.

When BioJava's org.biojava.bio.molbio.RestrictionEnzyme.forwardRegex()
returns the regex of a RestrictionEnzyme "gtakm" it will return
"gta[gtk][acm]". In which k (G or T) and m (A or C) are ambiguous.

So the ambiguous symbol "k" is converted ambiguous "[gtk]", by putting the
"k" in the brackets.

I simply solved it by removed all ambiguous symbols from the returned regex
string.

String searchPattern = re.getForwardRegex().replaceAll("[rymkswbdhvn]", "");

Regards,

Jesse


-----Original Message-----
From: Cor 
Subject: RE: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols 

Hi Jesse, 

Although I am a newbie myself, I have written some example code based on 
existing BioJava-testcode :

String symbols = "atgcgacgtcttaannnnnnatgcaac";
SymbolList sl = DNATools.createDNA(symbols);
String patternString = "g[ag]cg[ct]c"; 
PatternFactory fact = PatternFactory.makeFactory(DNATools.getDNA()); 
 Pattern pattern = fact.compile(patternString); 
 Matcher matcher = pattern.matcher(sl);
if (matcher.find()) {
 	System.out.println("match found");
     }
 else {
 fail("failed to find target ");
 }
	
In the pattern, you have to use [ag] in stead of [agr]. Otherwise you will
get 
the error:
 org.biojava.utils.regex.RegexException: all variant symbols must be atomic.
at 
org.biojava.utils.regex.PatternChecker.parseVariantSymbols(PatternChecker.ja
va:363)


Regards,

Cor




More information about the Biojava-l mailing list