[Biojava-l] Null Model

Thomas Down thomas at derkholm.net
Tue May 6 21:57:41 EDT 2003


Once upon a time, Ren, Zhen wrote:
> 
> My question is how I would create a null model like this instead 
> of 1/20 (5.00%) as the null weight for each.  Using the snippet 
> below certainly would do the job, but it is a little awkward, 
> isn't it?

Well, for something like that, you're probably going to
want to read the Distribution in from some kind of a file.
To get you started, here's a simple method from one of my
programs which loads a Distribution from a DOM tree:

    public Distribution readSymDistribution(Element cons) 
        throws Exception
    {
	    Alphabet alph = AlphabetManager.instance().alphabetForName(cons.getAttribute("alphabet"));
	    SymbolTokenization nameParser = alph.getTokenization("name");
	    Distribution dist = DistributionFactory.DEFAULT.createDistribution(alph);

	    Node chld2 = cons.getFirstChild();
	    while (chld2 != null) {
		if (chld2 instanceof Element) {
		    Element weight = (Element) chld2;
		    String sName = weight.getAttribute("symbol");
		    Symbol sym = nameParser.parseToken(sName);
		    
		    double w = Double.parseDouble(weight.getAttribute("weight"));
		    try {
			dist.setWeight(sym, w);
		    } catch (ChangeVetoException ex) {
			throw new BioError(ex);
		    }
		}
		chld2 = chld2.getNextSibling();
	    }

	    return dist;
    }

Sorry, it's rather old and not the last word in elegance
(I actually tend to avoid the DOM APIs these days) but it
does work.  The XML looks like:

        <distribution alphabet="DNA">
          <weight symbol="thymine" weight="0.2881944444444444" />
          <weight symbol="adenine" weight="0.5011574074074073" />
          <weight symbol="cytosine" weight="0.10300925925925926" />
          <weight symbol="guanine" weight="0.10763888888888888" />
        </distribution>

There will be some similar code in the BioJava XmlMarkovModel
class, which loads and saves MarkovModel objects.  But it doesn't
public methods for loading a single distribution.  Maybe it
should.

     Thomas.


More information about the Biojava-l mailing list