[Biojava-l] Null Model
Matthew Pocock
matthew_pocock at yahoo.co.uk
Wed May 7 10:40:36 EDT 2003
--- "Ren, Zhen" <zren at amylin.com> wrote: > Thank you,
folks.
>
> Please review the previous emails and pay attention
> to the way to set up the null model. More clearly,
> I collect them together here:
>
> Initially, I asked how to change 1/22 to 1/20 as the
> null weight for each of 20 common amino acids and 0
> for SEC and TERM. Thomas suggested:
>
> FiniteAlphabet alpha =
> ProteinTools.getTAlphabet();
> Distribution nullModel = new
> SimpleDistribution(alpha);
> for (Iterator i = alpha.iterator(); i.hasNext();
> ) {
> Symbol s = (Symbol) i.next();
> if (s.getName().equals("SEC") ||
> s.getName().equals("TER")) {
> nullModel.setWeight(s, 0);
> } else {
> nullModel.setWeight(s, 1.0 / 20.0);
> }
> }
> someOtherDistribution.setNullModel(nullModel);
>
> Then, I provided a link to human amino acid
> composition page. Each amino acid has a different
> null weight. Thomas provided the following code:
>
> public Distribution readSymDistribution(Element
> cons)
> throws Exception
> {
> Alphabet alph =
>
AlphabetManager.instance().alphabetForName(cons.getAttribute("alphabet"));
> SymbolTokenization nameParser =
> alph.getTokenization("name");
> Distribution dist =
>
DistributionFactory.DEFAULT.createDistribution(alph);
>
> Node chld2 = cons.getFirstChild();
> while (chld2 != null) {
> if (chld2 instanceof Element) {
> Element weight = (Element)
> chld2;
> String sName =
> weight.getAttribute("symbol");
> Symbol sym =
> nameParser.parseToken(sName);
>
> double w =
> Double.parseDouble(weight.getAttribute("weight"));
> try {
> dist.setWeight(sym, w);
> } catch (ChangeVetoException
> ex) {
> throw new BioError(ex);
> }
> }
> chld2 = chld2.getNextSibling();
> }
>
> return dist;
> }
>
> Frankly speaking, I donât really see any
> difference between them. Notice both use the method
> dist.setWeight(sym, weight);
> My question is really not how I get those numbers
> into my program (by reading from a file or using
> XML, or hard coding them or something else). I
> suspect at the moment this is probably the only way
> to set up a null model other than using the default
> null model, which is 1/22 as the null weight for all
> 20 amino acids plus SEC and TERM. Disagree?
>
> Furthermore, there is a more serious problem. Let's
> go back to my modified WeightMatrixDemo program (for
> protein) in "BioJava in Anger". Again, here is the
> code:
>
> 1 import java.util.*;
> 2 import org.biojava.bio.dist.*;
> 3 import org.biojava.bio.dp.*;
> 4 import org.biojava.bio.seq.*;
> 5 import org.biojava.bio.symbol.*;
> 6
> 7 public class WeightMatrixDemo {
> 8 public static void main(String[] args)
> throws Exception {
> 9 //make an Alignment of a motif.
> 10 Map map = new HashMap();
> 11 map.put("seq0",
> ProteinTools.createProtein("aggag"));
> 12 map.put("seq1",
> ProteinTools.createProtein("aggaa"));
> 13 map.put("seq2",
> ProteinTools.createProtein("aggag"));
> 14 map.put("seq3",
> ProteinTools.createProtein("aagag"));
> 15 Alignment align = new
> SimpleAlignment(map);
> 16
> 17 //make a Distribution[] of the motif
> 18 Distribution[] dists =
> DistributionTools.distOverAlignment(align, false,
> 0.01);
> 19
> 20 //make a Weight Matrix
> 21 WeightMatrix matrix = new
> SimpleWeightMatrix(dists);
> 22
> 23 //the sequence to score against
> 24 Sequence seq =
>
ProteinTools.createProteinSequence("aaagcctaggaagaggagctgat","seq");
> 25
> 26 //annotate the sequence with the weight
> matrix using a low threshold (0.1)
> 27 WeightMatrixAnnotator wma = new
> WeightMatrixAnnotator(matrix, 0.1);
> 28 seq = wma.annotate(seq);
> 29
> 30 //output match information
> 31 for(Iterator it = seq.features();
> it.hasNext(); ) {
> 32 Feature f = (Feature)it.next();
> 33 Location loc = f.getLocation();
> 34 System.out.println("Match at " +
> loc.getMin()+"-"+loc.getMax());
> 35 System.out.println("\tscore :
> "+f.getAnnotation().getProperty("score"));
> 36 }
> 37 }
> 38 }
>
> Notice on line 18,
> DistributionTools.distOverAlignment() returns an
> array of Distribution over the alignment. When
> calling this method, the default null model is
> already used for each Distribution. Later using
> dist.setWeight(symbol, weight) doesn't really
> matter. Or you do dist.setWeight(symbol, weight)
> before calling
> DistributionTools.distOverAlignment(). It doesn't
> matter either because DistributionTools doesn't take
> user-defined null models. So, setting up a
> user-defined null model must be done within the
> DistributionTools.distOverAlignment() method. After
> looking into the code inside
> DistributionTools.distOverAlignment(), the following
> lines:
> Distribution nullmodel = pos[i].getNullModel();
> nullmodel.setWeight(symbol, weight);
> need to be added before this line:
> pos[i] =
>
DistributionFactory.DEFAULT.createDistribution(alpha);
>
> Hopefully, you see what I am saying here. So,
> DistributionTools.java really needs to add the
> capacity of defining a non-default null model to do
> the distribution over an alignment. Otherwise, the
> use is very limited. Your thoughts?
>
> Thank you very much.
>
> Zhen
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer
More information about the Biojava-l
mailing list