[Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in 1.3.1 but not in 1.3

Michael Heuer heuermh at acm.org
Wed Jan 28 01:47:27 EST 2004


Hello Mark,

It looks like that change was already made on the main branch, in version
1.45 dated 22 Dec 2003.  Should I commit this to the release-1_3-branch?

   michael


On Wed, 28 Jan 2004 mark.schreiber at group.novartis.com wrote:

> Hi Again,
>
> I've found the problem.
>
> The code starting at line 153 in DP needs changing from
>
>      for (int c = 0; c < cols; c++) {
>        score += scoreType.calculateScore(matrix.getColumn(c),
> symList.symbolAt(c + start));
>      }
>
> to
>
>      for (int c = 0; c < cols; c++) {
>        score += Math.log(scoreType.calculateScore(matrix.getColumn(c),
> symList.symbolAt(c + start)));
>      }
>
> so it will be consistent with the scoreWeightMatrix() method that doesn't
> use a ScoreType. Actually, changing it to a log will prevent underflow
> errors on large WeightMatrices. Interestingly the WeightMatrixAnnotator
> converts it back to a normal probability with a Math.exp() operation
> before annotation. I'm sure it doesn't need to be this conveluted??
>
> Can someone add that fix to CVS. I'm having trouble with CVS just know so
> I can't.
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn
> Singapore 117528
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
>
>
>
>
> Bruno Aranda - e-BioIntel <elmosca at terra.es>
> Sent by: biojava-dev-bounces at portal.open-bio.org
> 01/27/2004 07:30 PM
> Please respond to biodev
>
>
>         To:     biojava-dev at biojava.org
>         cc:
>         Subject:        [Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in
> 1.3.1 but not in 1.3
>
>
> Hi Mark,
>
> I've tried to increase the memory heap to 512 Mb but my little linux
> almost died... However I've found the origin of the problem. The class I
> tested followed the steps of your wonderful tutorial, and I used the low
> score treshold of "0.1". With the new ScoreType System I got too many
> results for my motif (every base in the sequence), so too many features
> were created and the OutOutMemoryError was raised.
> Now, for instance, I can put a treshold of 4000 (?) and I get some
> results (some of them with a probability higher than 5000 (?)... but I
> don't understand why probability scores are that high. Well, I will send
> to your home a beer truck if you can explain which probability is used
> for these score matrices ;-). Thanks,
>
> Bruno Aranda
> ebioIntel
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list