[Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in 1.3.1 but not in 1.3

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Wed Jan 28 01:36:27 EST 2004


Hi Again,

I've found the problem.

The code starting at line 153 in DP needs changing from 

     for (int c = 0; c < cols; c++) {
       score += scoreType.calculateScore(matrix.getColumn(c), 
symList.symbolAt(c + start));
     }

to

     for (int c = 0; c < cols; c++) {
       score += Math.log(scoreType.calculateScore(matrix.getColumn(c), 
symList.symbolAt(c + start)));
     }

so it will be consistent with the scoreWeightMatrix() method that doesn't 
use a ScoreType. Actually, changing it to a log will prevent underflow 
errors on large WeightMatrices. Interestingly the WeightMatrixAnnotator 
converts it back to a normal probability with a Math.exp() operation 
before annotation. I'm sure it doesn't need to be this conveluted??

Can someone add that fix to CVS. I'm having trouble with CVS just know so 
I can't.

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
1 Science Park Road
#04-14 The Capricorn
Singapore 117528

phone +65 6722 2973
fax  +65 6722 2910





Bruno Aranda - e-BioIntel <elmosca at terra.es>
Sent by: biojava-dev-bounces at portal.open-bio.org
01/27/2004 07:30 PM
Please respond to biodev

 
        To:     biojava-dev at biojava.org
        cc: 
        Subject:        [Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in 
1.3.1 but not in 1.3


Hi Mark,

I've tried to increase the memory heap to 512 Mb but my little linux 
almost died... However I've found the origin of the problem. The class I 
tested followed the steps of your wonderful tutorial, and I used the low 
score treshold of "0.1". With the new ScoreType System I got too many 
results for my motif (every base in the sequence), so too many features 
were created and the OutOutMemoryError was raised.
Now, for instance, I can put a treshold of 4000 (?) and I get some 
results (some of them with a probability higher than 5000 (?)... but I 
don't understand why probability scores are that high. Well, I will send 
to your home a beer truck if you can explain which probability is used 
for these score matrices ;-). Thanks,

Bruno Aranda
ebioIntel

_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list