[Biojava-l] Null Model

Matthew Pocock matthew_pocock at yahoo.co.uk
Wed May 7 10:40:36 EDT 2003


 --- "Ren, Zhen" <zren at amylin.com> wrote: > Thank you,
folks.
> 
> Please review the previous emails and pay attention
> to the way to set up the null model.  More clearly,
> I collect them together here:
> 
> Initially, I asked how to change 1/22 to 1/20 as the
> null weight for each of 20 common amino acids and 0
> for SEC and TERM. Thomas suggested:
> 
>     FiniteAlphabet alpha =
> ProteinTools.getTAlphabet();
>     Distribution nullModel = new
> SimpleDistribution(alpha);
>     for (Iterator i = alpha.iterator(); i.hasNext();
> ) {
>         Symbol s = (Symbol) i.next();
>         if (s.getName().equals("SEC") ||
> s.getName().equals("TER")) {
>             nullModel.setWeight(s, 0);
>         } else {
>             nullModel.setWeight(s, 1.0 / 20.0);
>         }
>     }
>     someOtherDistribution.setNullModel(nullModel);
> 
> Then, I provided a link to human amino acid
> composition page.  Each amino acid has a different
> null weight.  Thomas provided the following code:
> 
> 	    public Distribution readSymDistribution(Element
> cons)
> 	        throws Exception
> 	    {
> 	            Alphabet alph =
>
AlphabetManager.instance().alphabetForName(cons.getAttribute("alphabet"));
> 	            SymbolTokenization nameParser =
> alph.getTokenization("name");
> 	            Distribution dist =
>
DistributionFactory.DEFAULT.createDistribution(alph);
> 	
> 	            Node chld2 = cons.getFirstChild();
> 	            while (chld2 != null) {
> 	                if (chld2 instanceof Element) {
> 	                    Element weight = (Element)
> chld2;
> 	                    String sName =
> weight.getAttribute("symbol");
> 	                    Symbol sym =
> nameParser.parseToken(sName);
> 	                   
> 	                    double w =
> Double.parseDouble(weight.getAttribute("weight"));
> 	                    try {
> 	                        dist.setWeight(sym, w);
> 	                    } catch (ChangeVetoException
> ex) {
> 	                        throw new BioError(ex);
> 	                    }
> 	                }
> 	                chld2 = chld2.getNextSibling();
> 	            }
> 	
> 	            return dist;
> 	    }
> 
> Frankly speaking, I don’t really see any
> difference between them.  Notice both use the method
> dist.setWeight(sym, weight);
> My question is really not how I get those numbers
> into my program (by reading from a file or using
> XML, or hard coding them or something else).  I
> suspect at the moment this is probably the only way
> to set up a null model other than using the default
> null model, which is 1/22 as the null weight for all
> 20 amino acids plus SEC and TERM.  Disagree?
> 
> Furthermore, there is a more serious problem.  Let's
> go back to my modified WeightMatrixDemo program (for
> protein) in "BioJava in Anger".  Again, here is the
> code:
> 
> 1	import java.util.*;
> 2	import org.biojava.bio.dist.*;
> 3	import org.biojava.bio.dp.*;
> 4	import org.biojava.bio.seq.*;
> 5	import org.biojava.bio.symbol.*;
> 6
> 7	public class WeightMatrixDemo {
> 8         public static void main(String[] args)
> throws Exception {
> 9	        //make an Alignment of a motif.
> 10	        Map map = new HashMap();
> 11	        map.put("seq0",
> ProteinTools.createProtein("aggag"));
> 12	        map.put("seq1",
> ProteinTools.createProtein("aggaa"));
> 13	        map.put("seq2",
> ProteinTools.createProtein("aggag"));
> 14	        map.put("seq3",
> ProteinTools.createProtein("aagag"));
> 15	        Alignment align = new
> SimpleAlignment(map);
> 16
> 17	        //make a Distribution[] of the motif
> 18	        Distribution[] dists =
> DistributionTools.distOverAlignment(align, false,
> 0.01);
> 19
> 20	        //make a Weight Matrix
> 21	        WeightMatrix matrix = new
> SimpleWeightMatrix(dists);
> 22
> 23	        //the sequence to score against
> 24	        Sequence seq =
>
ProteinTools.createProteinSequence("aaagcctaggaagaggagctgat","seq");
> 25
> 26	        //annotate the sequence with the weight
> matrix using a low threshold (0.1)
> 27            WeightMatrixAnnotator wma = new
> WeightMatrixAnnotator(matrix, 0.1);
> 28            seq = wma.annotate(seq);
> 29
> 30            //output match information
> 31            for(Iterator it = seq.features();
> it.hasNext(); ) {
> 32                Feature f = (Feature)it.next();
> 33		      Location loc = f.getLocation();
> 34	            System.out.println("Match at " +
> loc.getMin()+"-"+loc.getMax());
> 35	            System.out.println("\tscore :
> "+f.getAnnotation().getProperty("score"));
> 36            }
> 37	    }
> 38	}
> 
> Notice on line 18, 
> DistributionTools.distOverAlignment() returns an
> array of Distribution over the alignment.  When
> calling this method, the default null model is
> already used for each Distribution.  Later using
> dist.setWeight(symbol, weight) doesn't really
> matter.  Or you do dist.setWeight(symbol, weight)
> before calling
> DistributionTools.distOverAlignment().  It doesn't
> matter either because DistributionTools doesn't take
> user-defined null models.  So, setting up a
> user-defined null model must be done within the
> DistributionTools.distOverAlignment() method.  After
> looking into the code inside
> DistributionTools.distOverAlignment(), the following
> lines:
> 	Distribution nullmodel = pos[i].getNullModel();
> 	nullmodel.setWeight(symbol, weight);
> need to be added before this line:
> 	pos[i] =
>
DistributionFactory.DEFAULT.createDistribution(alpha);
> 
> Hopefully, you see what I am saying here.  So,
> DistributionTools.java really needs to add the
> capacity of defining a non-default null model to do
> the distribution over an alignment.  Otherwise, the
> use is very limited.  Your thoughts?
> 
> Thank you very much.
> 
> Zhen
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l 

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer


More information about the Biojava-l mailing list