[Biojava-l] Having problems using biojava regex.

mark.schreiber at novartis.com mark.schreiber at novartis.com
Thu Mar 15 02:27:14 UTC 2007


It's not so much that biojava is case sensitive as it is that regexes are. 
You could probably set your regex to be case insensitive to avoid this 
kind of problem. 

As a cool trick though, it is possible to soft-mask sequences in BioJava 
such that lower case and upper case have different meaning. Typically 
lower case is being masked due to repeats or similar. You could then make 
a regex that finds only matches in the unmasked regions or alternatively 
the masked regions.

- Mark

Mark Schreiber
Research Investigator (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com
www.dengueinfo.org

phone +65 6722 2973
fax  +65 6722 2910





"Charles Danko" <dankoc at gmail.com>
Sent by: biojava-l-bounces at lists.open-bio.org
03/15/2007 10:20 AM

 
        To:     "Mark Schreiber" <markjschreiber at gmail.com>
        cc:     biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Having problems using biojava regex.


Thank worked!  I had no idea it would matter!

Thanks very much for the response!

Charles

On 3/14/07, Mark Schreiber <markjschreiber at gmail.com> wrote:
>
> Hi -
>
> From memory everything should be lower case. BioJava always represents 
DNA
> as lowercase and protein as upper case as per convention.
>
> Try that.
>
> - Mark
>
>
> On 3/15/07, Charles Danko <dankoc at gmail.com> wrote:
>
> > Hi,
> >
> > I'm having problems using the biojava regex classes.
> >
> > According to my understanding, the code posted below is the simplest
> > possible example of this class.
> >
> > However, my output is:
> > TAG
> > false
> > 0
> > TAG
> >
> > The TAG, TAG part of the output is for pattern.patternAsString() and
> > occurence.pattern().patternAsString().  As I understand it, both of
> > these
> > are correct, leading me to believe that both the Pattern and Matcher
> > objects
> > are being created correctly.  However, occurences.find() = false and
> > occurences.groupCount() = 0 ... meaning it's not finding any matches!?
> >
> > Where am I going wrong?
> >
> > Many thanks!
> > Charles
> >
> > import org.biojava.bio.*;
> > import org.biojava.bio.seq.*;
> > import org.biojava.bio.symbol.*;
> > import org.biojava.utils.regex.*;
> > import java.util.* ;
> > import java.io.*;
> >
> > public class Ambiguity2 {
> > public static void main(String[] args) {
> >    try {
> >        FiniteAlphabet IUPAC = DNATools.getDNA();
> >
> >        // Create pattern using pattern factory.
> >        Pattern pattern;
> >        PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC);
> >        try{
> >            pattern = FACTORY.compile("TAG");
> >        } catch(Exception e) {e.printStackTrace(); return;}
> >        System.out.println(pattern.patternAsString());
> >
> >        // Variables needed...
> >        Matcher occurences;
> >
> >        // Promoter & Element
> >        Element WorkingElement = new Element("ElementName");
> >        SymbolList WorkingPromoter = DNATools.createDNA
> > ("TAGAGATAGACGATAGC");
> >
> >        // Obtain iterator of patterns.
> >        try {
> >            occurences = pattern.matcher( WorkingPromoter );
> >        } catch(Exception e) {e.printStackTrace(); return;}
> >        System.out.println(occurences.find());
> >        System.out.println(occurences.groupCount());
> >        System.out.println(occurences.pattern().patternAsString());
> >        // Foreach match
> >        while( occurences.find() ) {
> >                // Create Occurence object using information from
> > patterns.
> >            System.out.println("Match: " +"\t"+ WorkingPromoter +"\n"+
> > occurences.start() +"\t"+ occurences.group().seqString());
> >        }
> >    }
> >
> >    catch (Exception ex) {
> >      ex.printStackTrace();
> >    }
> > }
> > }
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-l
> >
>
>
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l






More information about the Biojava-l mailing list