FW: [Biojava-l] orderNSymbols and Alphabets

Schreiber, Mark mark.schreiber@agresearch.co.nz
Fri, 2 Mar 2001 10:55:31 +1300


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C0A29A.5603C5C8
Content-Type: text/plain;
	charset="iso-8859-1"

Sorry,

Should have sent this to the whole group

Mark

-----Original Message-----
From: Schreiber, Mark 
Sent: Friday, March 02, 2001 9:58 AM
To: 'Matthew Pocock'
Subject: RE: [Biojava-l] orderNSymbols and Alphabets


Hi,

Attached is a program which details some of my adventures in orderNSymbol
land which may be of use as a demo/ tutorial.

Thanks to those who showed me how to do it.

Mark

> -----Original Message-----
> From: Matthew Pocock [mailto:mrp@sanger.ac.uk]
> Sent: Tuesday, February 27, 2001 11:55 PM
> To: Thomas Down
> Cc: Schreiber, Mark; 'biojava-l@biojava.org'
> Subject: Re: [Biojava-l] orderNSymbols and Alphabets
> 
> 
> ...and to make the n'th order symbol list for the distribution to be 
> used with you can use one of:
> 
> SymbolListViews.orderNSybolList(source, order)
> SymbolListViews.windowedSymbolList(source, windowWidth)
> 
> Thomas Down wrote:
> 
> > On Tue, Feb 27, 2001 at 05:04:59PM +1300, Schreiber, Mark wrote:
> > 
> >> Hi
> >> 
> >> What is the simplest way to create an orderN alphabet or 
> symbol that can be
> >> used in a dsitribution?
> > 
> > 
> > Cross product alphabets are created via the AlphabetManager:
> > 
> >   Alphabet codons = AlphabetManager.getCrossProductAlphabet(
> >                             Collections.nCopies(3, 
> DNATools.getDNA());
> > 
> > This method will work on any arbitrary List of Alphabets.
> > 
> > You can then retrieve symbols from that alphabet:
> > 
> >   List symbols = DNATools.createDNA("atg").toList();
> >   Symbol startCodon = codons.getSymbol(symbols);
> > 
> > This method works on an arbitrary list of Symbols (but obviously
> > these must match the alphabet -- you'll get an 
> IllegalSymbolException
> > otherwise.
> > 
> > Hope this helps,
> > 
> >    Thomas.
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l@biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 


------_=_NextPart_000_01C0A29A.5603C5C8
Content-Type: application/octet-stream;
	name="CrossProductTest.java"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="CrossProductTest.java"

/*=0A=
 *                    BioJava development code=0A=
 *=0A=
 * This code may be freely distributed and modified under the=0A=
 * terms of the GNU Lesser General Public Licence.  This should=0A=
 * be distributed with the code.  If you do not have a copy,=0A=
 * see:=0A=
 *=0A=
 *      http://www.gnu.org/copyleft/lesser.html=0A=
 *=0A=
 * Copyright for this code is held jointly by the individual=0A=
 * authors.  These should be listed in @author doc comments.=0A=
 *=0A=
 * For more information on the BioJava project and its aims,=0A=
 * or to join the biojava-l mailing list, visit the home page=0A=
 * at:=0A=
 *=0A=
 *      http://www.biojava.org/=0A=
 *=0A=
 */=0A=
=0A=
=0A=
package testbed;=0A=
=0A=
import org.biojava.bio.*;=0A=
import org.biojava.utils.*;=0A=
import org.biojava.bio.dist.*;=0A=
import org.biojava.bio.seq.*;=0A=
import org.biojava.bio.symbol.*;=0A=
import java.util.*;=0A=
=0A=
/**=0A=
 * Title:        CrossProductTest=0A=
 * Description:  A test of the nmer alphabet and distribution =
concepts=0A=
 *=0A=
 * This program demonstrates the use of crossproduct (nmer) alphabets =
and=0A=
 * distributions. A codon distribution is created from a sequence. =
This=0A=
 * distribution is them used to generate another random sequence. The =
probality=0A=
 * of this new sequence is then calculated. This program also =
demonstrates=0A=
 * how a cross product alphabet may be displayed to STDOUT.=0A=
 *=0A=
 * Thanks to Matthew and Thomas for hints and suggestions.=0A=
 *=0A=
 * @author       Mark Schreiber=0A=
 * @version 1.0=0A=
 */=0A=
=0A=
public class CrossProductTest {=0A=
=0A=
  double prob =3D 1.0; //emmission probability=0A=
=0A=
  public CrossProductTest() throws NestedException {=0A=
    try{=0A=
      //create a cross product of three dna alphabets ie a codon =
alphabet.=0A=
      Alphabet tri =3D AlphabetManager.getCrossProductAlphabet(=0A=
                                      =
Collections.nCopies(3,DNATools.getDNA()));=0A=
=0A=
=0A=
      //create a distribution for the alphabet and a trainer.=0A=
      Distribution d =3D =
DistributionFactory.DEFAULT.createDistribution(tri);=0A=
      DistributionTrainer dt =3D new SimpleDistributionTrainer(d);=0A=
      DistributionTrainerContext context =3D new =
SimpleDistributionTrainerContext();=0A=
=0A=
      //create a dna sequence.=0A=
      SymbolList seq =3D DNATools.createDNA(=0A=
        "atgatgatggtggcggaggatgggcgcgcggtggaaacaacaattaca" +=0A=
        "tagcaccccataccaatagacacagatggcggtgtgaacagataagac" +=0A=
        "gcttagacacaaatgacacacggggccggggaatatttttaaatacaa" +=0A=
        "cggctctctttataggcgcgcctttaaatataggcgcgcgcgggccta" +=0A=
        "tttataaatatttttagaccacacccatatcatacgacaagaagccat" +=0A=
        "ccaaatacggataacacccctagaggggaaccccgttatattttacac"=0A=
      );=0A=
=0A=
      //create a trimer view on the sequence.=0A=
      SymbolList subseq =3D SymbolListViews.windowedSymbolList(seq, =
3);=0A=
=0A=
      //add trimer counts to the distribution.=0A=
      Iterator iter =3D subseq.iterator();=0A=
      while (iter.hasNext()) {=0A=
        Object item =3D iter.next();=0A=
        dt.addCount(context,(AtomicSymbol)item,1.0);=0A=
      }=0A=
      //train the model using the weights given.=0A=
      dt.train(0.0); //No psuedo-counts to nullModel.=0A=
=0A=
      for (int i =3D 1; i <=3D 20; i++) { // generate a new sequence=0A=
        Symbol sym =3D d.sampleSymbol();=0A=
        //get the symbols that make up sym.=0A=
        List syms =3D ((BasisSymbol)sym).getSymbols();=0A=
        //print the codon=0A=
        iter =3D syms.iterator();=0A=
        while (iter.hasNext()) {=0A=
          Symbol s  =3D (Symbol)iter.next();=0A=
          System.out.print(s.getToken());=0A=
        }=0A=
        //get the probability of the emmission so far=0A=
        prob *=3D d.getWeight(sym);=0A=
      }=0A=
      System.out.println("\nProbablity of emission =3D " + prob);=0A=
=0A=
    }catch(Exception e){=0A=
      throw new NestedException(e);=0A=
    }=0A=
  }=0A=
  public static void main(String[] args) {=0A=
    try{=0A=
      CrossProductTest crossProductTest1 =3D new CrossProductTest();=0A=
    }catch(NestedException ne){=0A=
      ne.printStackTrace(System.out);=0A=
    }=0A=
  }=0A=
}
------_=_NextPart_000_01C0A29A.5603C5C8--