[Biojava-l] Anyon has ever search for CpG Islands with BioJava?
Stein Aerts
stein.aerts at esat.kuleuven.ac.be
Fri Mar 28 10:29:33 EST 2003
Hi Sylvain,
This piece of code should do it. The nrCGRandom changes with the
windowlength though (I calculated this value from EPD a long time ago).
I use org.apache.oro.text.regex.*;
Bye,
Stein.
=============================================
public void annotateCpG(Sequence seq) throws Exception{
double nrCGRandom = 14.1048d;
int windowLength = 200;
int step=100;
String motif = "cg";
//
PatternMatcher matcher=new Perl5Matcher();
PatternCompiler compiler = new Perl5Compiler();
Pattern pattern = null;
PatternMatcherInput input;
MatchResult result;
pattern = compiler.compile(motif);
int nrOfOcc=0;
int nrConsec=0;
double gcContent = 0d;
double cgValue=0d;
int index = 1;
String dnaStr;
SymbolList window= null;
int i=1;
while (index<seq.length()-windowLength){
window = seq.subList(index,index+windowLength);
dnaStr = window.seqString();
input = new PatternMatcherInput(dnaStr);
while(matcher.contains(input, pattern)) {
nrOfOcc++;
}
gcContent = GCContent(window);
cgValue = (nrCgRandom/nrOfOcc);
if (cgValue<0.6d && gcContent>50.0d){
nrConsec++;
}
else if (nrConsec>0){
int start = (index-(nrConsec)*step);
int stop = index;
nrConsec=0;
Feature.Template template = new Feature.Template();
template.type = "misc_feature";
template.location = new RangeLocation(start,stop);
template.annotation = new SimpleAnnotation();
template.annotation.setProperty("type","CpG Island");
seq.createFeature(template);
}
nrOfOcc=0;
index=index+step;
}
}
public double GCContent(SymbolList seq){
int gc = 0;
for (int pos = 1; pos <= seq.length(); ++pos) {
org.biojava.bio.symbol.Symbol sym = seq.symbolAt(pos);
if (sym == org.biojava.bio.seq.DNATools.g() || sym ==
org.biojava.bio.seq.DNATools.c())
++gc;
}
double content = (gc*100)/seq.length();
return content;
}
Sylvain Foisy wrote:
>Hi,
>
>Sorry for the hand holding question. I am trying to wrap the output of a
>Perl script that finds CpG islands (cpgi130.pl) using Runtime.exec with
>little success. Has anyone ever built a class that can find CpG islands in
>Sequence objects and build features with RangeLocations?
>
>Other solution: has anyone ever try to write a wrapper that would use the
>output of any program of the EMBOSS suite? It has some CpG island finding
>software.
>
>Anyhelp would be much appreciated
>
>P.S.: there has been a call for an algorithm repository on Usenet. Would it
>be possible/interesting to create such a repository for classes built with
>BioJava that could be of interest to other rookies like me.
>
>Just an idea...
>
>Sylvain
>
>
>===================================================================
>Sylvain Foisy, Ph. D.
>Directeur - operations / Project Manager
>BioneQ - Reseau quebecois de bio-informatique
>U. de Montreal / Genome-Quebec
>
>Addresse postale:
>
>Departement de biochimie
>Pavillon principal
>2900, boul. Édouard-Montpetit
>Montréal (Québec) H3T 1J4
>
>Tel: (514) 343-6111 x.4066
>Courriel: sylvain.foisy at bioneq.qc.ca
>===================================================================
>
>
>_______________________________________________
>Biojava-l mailing list - Biojava-l at biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>
>
>
--
Stein Aerts BioI at SISTA
K.U.Leuven ESAT-SCD Belgium
http://www.esat.kuleuven.ac.be/~dna/BioI
More information about the Biojava-l
mailing list