[Biojava-l] Pattern matching

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Jun 26 01:33:02 UTC 2007

Hi -

I think this has come up before on the list.

Matcher.find() by default will begin a new search at the end of the old 
search. To make it begin it's search at any other place use the other form 
of the Matcher.find(int start) method.

- Mark

Jerome LANE <Jerome.Lane at igh.cnrs.fr>
Sent by: biojava-l-bounces at lists.open-bio.org
06/26/2007 02:24 AM

        To:     biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] Pattern matching


I have used biojava Pattern class to match DNA sequence. But I can't 
find all matches for my sequence. For example here a bit of code that I 
have implemented to search for "aa" pattern in "aaaa" DNA sequence :

try {
               // Variables needed...
               org.biojava.utils.regex.Matcher occurences ;
               FiniteAlphabet IUPAC = DNATools.getDNA();
               SymbolList WorkingSequence = DNATools.createDNA("aaaa");
                         // Create pattern using pattern factory.
               org.biojava.utils.regex.Pattern pattern;
               PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC);
                   pattern = FACTORY.compile("aa");
               } catch(Exception e) {e.printStackTrace(); return;}
               System.out.println("Searching for: 
"+pattern.patternAsString( ) );
                         // Obtain iterator of matches.
               try {
                   occurences = pattern.matcher( WorkingSequence );
               } catch(Exception e) {e.printStackTrace(); return;}
                   // Foreach match
               while( occurences.find( ) ) {
                   System.out.println("Match: " +"\t"+ WorkingSequence
                                   +"\n"+ occurences.start() +"\t"+ 
           } catch (Exception ex) {

And this is the output :

Searching for: aa
Match:     org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
1    aa
Match:     org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
3    aa
But for the input sequence "aaaa" I should have 3 matchs at postion 1, 2 
and 3. Is there any parameter to provide for it ?

Best regards

Biojava-l mailing list  -  Biojava-l at lists.open-bio.org

More information about the Biojava-l mailing list