[Biojava-dev] new seq searching classes
Matthew Pocock
matthew.pocock at ncl.ac.uk
Tue Sep 2 12:09:57 EDT 2003
Hi,
I've added a couple of classes in org.biojava.bio.search for finding
regions of sequence content. They are SeqContentPattern and
SeqContentMatcher - the API is loosly based upon KMPSearch and the 1.4
regex libs. These classes aren't javadocked yet.
SeqContentPattern encapsulates the rules about what regions to select -
the length, and the minimum and maximum number of occurences for each
nucleotide.
SeqContentMatcher is a cursor produced by scp.matcher(SymbolList) and
can be used to find the next match, get the matching sub-sequence and to
discover the offset of that match.
E.g. to find regions of length 10 with at least 8 As, no G or T and at
most 2 Cs, you could do something like:
SeqContentPattern scp = new SeqContentPattern(DNATools.getDNA());
scp.setLength(10);
scp.setMinCounts(DNATools.a(), 8);
scp.setMaxCounts(DNATools.g(), 0);
scp.setMaxCounts(DNATools.c(), 2);
scp.setMaxCounts(DNATooos.t(), 0);
Then to search with this you'd do something like:
SeqContentMatcher scm = scp.matcher(symList);
while(scm.find()) {
System.out.println("Hit at: " + scm.pos());
}
Anybody think this is usefull?
Matthew
More information about the biojava-dev
mailing list