[Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in 1.3.1 but not in 1.3

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Wed Jan 28 01:10:43 EST 2004


Hi Bruno -

WeightMatrices used to be scored by the DP class using the following 
method (called from within the WeightMatrixAnnotator)

  public static double scoreWeightMatrix(
          WeightMatrix matrix, SymbolList symList, int start)
          throws IllegalSymbolException {
    double score = 0;
    int cols = matrix.columns();

    for (int c = 0; c < cols; c++) {
      score += Math.log(
              matrix.getColumn(c).getWeight(symList.symbolAt(c + start)));
    }

    return score;
  }

They are now score using this method from the DP class with (by default) 
ScoreType.PROBABILITY

  public static double scoreWeightMatrix(
          WeightMatrix matrix,
          SymbolList symList,
          ScoreType scoreType,
          int start)
          throws IllegalSymbolException {
    double score = 0;
    int cols = matrix.columns();

    for (int c = 0; c < cols; c++) {
      score += Math.log(scoreType.calculateScore(
              matrix.getColumn(c), symList.symbolAt(c + start)));
    }

    return score;
  }


As far as I can tell ScoreType.PROBABILITY does exactly the same thing as 
before. It returns the weight of the symbol at that position. I'm not sure 
I understand what is going on.

- Mark






Bruno Aranda - e-BioIntel <elmosca at terra.es>
Sent by: biojava-dev-bounces at portal.open-bio.org
01/27/2004 07:30 PM
Please respond to biodev

 
        To:     biojava-dev at biojava.org
        cc: 
        Subject:        [Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in 
1.3.1 but not in 1.3


Hi Mark,

I've tried to increase the memory heap to 512 Mb but my little linux 
almost died... However I've found the origin of the problem. The class I 
tested followed the steps of your wonderful tutorial, and I used the low 
score treshold of "0.1". With the new ScoreType System I got too many 
results for my motif (every base in the sequence), so too many features 
were created and the OutOutMemoryError was raised.
Now, for instance, I can put a treshold of 4000 (?) and I get some 
results (some of them with a probability higher than 5000 (?)... but I 
don't understand why probability scores are that high. Well, I will send 
to your home a beer truck if you can explain which probability is used 
for these score matrices ;-). Thanks,

Bruno Aranda
ebioIntel

_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list