[Biojava-dev] SymbolTokenizer for Meme class
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Tue Jul 12 04:51:30 EDT 2005
>I have used MEME for DNA sequences and produced text output (no html).
>BioJava version: 1.3
I would strongly recommend upgrading to biojava 1.4 (now the official
release) unless you have a strong attachment to version 1.3, that version
is over 2 years old now. Looking in CVS at least one change was made to
update the file to read meme v3 output. That should fix the bug you see
with "log", i believe I made the same change you did.
>I don't how, jow java's StreamTokenizer works, but the Meme constructor
>seems to look for the keyword "ALPHABET". Then i guess it looks for the
>first TT_WORD after that keyword, which is ACGT
>(ALPHABET: ACGT)
>It breaks when trying to build a SimpleSymbolList from ACGT using the
>SymbolTokenization I gave as parameter.
>
>However it works when I construct the parser in another way:
>SymbolTokenization ct = DNATools.getDNA().getTokenization("token");
>
>instead of
>
>SymbolTokenization ct = new
CharacterTokenization(DNATools.getDNA(),true);
Sorry, I didn't read your email carefully. As you have discovered the
technique you use is the best way to get a SymbolTokenization. I should
put this in Biojava in Anger.
>There is another thing that does not work.
>The column distributions of the weight matrix class
>are not allowed to get negative values. On the one hand this is
>semantically correct since it is a probability distribution. On the
>other hand the Meme constructor tries to read the log-odds-score matrix.
>(looks for keyword "log"). I've changed the constructor (at my local
>installation) to look for keyword "letter". Now it reads the
>letter-probability matrix which is also given in the result files.
I believe this is fixed in biojava 1.4 (see above). Let me know if this
doesn't work.
>Is there a class for log-odds matrices?
Not really, WeightMatrices are backed by Distributions which are not
log-odds. However WeightMatrices can use a log-odds ScoreType which
calculates the log odds of a Distribution versus its Null Distribution.
- Mark
More information about the biojava-dev
mailing list