[Biojava-l] Hard Times using File Inputs for HMM Package

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Sun Feb 29 20:20:21 EST 2004


Hi -

Possible guesses about what might be wrong:

1) You haven't created Symbols for your Alphabet
2) You haven't added said Symbols to your Alphabet

This page http://www.biojava.org/docs/bj_in_anger/customAlpha.htm shows 
how to make a custom Alphabet. It may be useful.

Hope this helps,

- Mark

ps Just wondering, why do you need a custom Alphabet for Protein??? There 
is a perfectly good one in ProteinTools.getAlphabet().






sacoca at mcb.mcgill.ca
Sent by: biojava-l-bounces at portal.open-bio.org
02/29/2004 01:08 AM

 
        To:     biojava-l at biojava.org
        cc: 
        Subject:        [Biojava-l] Hard Times using File Inputs for HMM Package


Hey all,

I built a markov model using the Biojava package and am having an
incredibly hard time using it on sequences that I have stored in fasta
format on a file. The problem is that I specified my own SimpleAlphabet,
for protein sequences using the one letter amino acid code much like the
dishonest casino example that you have on the tutorial page for dynamic
programming, and each time I try reading the sequence all I get is :

org.biojava.bio.symbol.IllegalSymbolException: Symbol G not found in
alphabet ProtAlphabet        at
org.biojava.bio.symbol.AbstractAlphabet.validate(AbstractAlphabet.java:278)
        at
org.biojava.bio.symbol.LinearAlphabetIndex.indexForSymbol(LinearAlphabetIndex.java:117)
        at
org.biojava.bio.dist.SimpleDistribution.getWeightImpl(SimpleDistribution.java:131)
        at
org.biojava.bio.dist.AbstractDistribution.getWeight(AbstractDistribution.java:197)
        at
org.biojava.bio.dp.ScoreType$Probability.calculateScore(ScoreType.java:48)
        at 
org.biojava.bio.dp.onehead.SingleDP.getEmission(SingleDP.java:100)
        at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:553)
        at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:488)

I've tried building a parser with CharacterTokenization

such as Parser = new CharacterTokenization(ProtAlphabet,false) and then
bidning each symbol to the proper character

     for(int i=0; i<Protein.length;i++)
        Parser.bindSymbol(Protein[i], AAC[i]);
and then building a symbol list SymbolList Bcl2SequenceList = new
SimpleSymbolList(Parser,ProtSequence);
but nothing works. By the way, I've also tried using the SeqIOTools to
read the file but the same error was generated. Symbol X was not found in
alphabet ProtAlphabet.

HELP!!!!!!

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list