[Biojava-l] what it needs to do this with (Bio)Java...

Matthew Pocock mrp@sanger.ac.uk
Thu, 20 Apr 2000 17:53:20 +0100


Gerald,

Gerald Loeffler wrote:

>         o given 2 "NT-Sequence" objects, construct the optimal alignment
> between them, return it as an "Alignment" object, and also give a Score
> for the alignment. (I know, this is gonna be slow, but it's fine for my
> application...)--

I think we should end up with some "well known" alignment algorithms included for
free as public-static-final variables somewhere. For now, it is easy enough to
build an alignment algorithm that look like this:

               i2\
               |\/
    start/end -*- match\
               |      \/
               i1\
                \/

// just alias the singleton localy to save typing fatigue
AlphabetManager am = AlphabetManager.instance();

// define what type of things you are going to align
Alphabet DNA = DNATools.getAlphabet(); // basicaly this is DNA
Alphabet gappedDNA = am.getGappedAlphabet(DNA); // with gaps
CrossProductAlphabet pairDNA = am.getCrossProductAlphabet(
  Collections.nCopies(2, gappedDNA)
);
Residue gap = am.getGapResidue();

// the match state will consume a token from each sequence
EmissionState match = StateFactory.DEFAULT.createState(
  pairDNA, new int [] { 1, 1 }, "Match"
);

// insert1 consumes a token from sequence 1 & skips one in sequence 2
EmissionState i1 = StateFactory.DEFAULT.createState(
  pairDNA, new int [] { 1, 0 }, "Insert-1"
);

// insert2 consumes a token from sequence 2 & skips one in sequence 1
EmissionState i2 = StateFactory.DEFAULT.createState(
  pairDNA, new int [] { 0, 1 }, "Insert-2"
);

Residue [] rA = new Residue[2];
List rL = Arrays.asList(rA);

// set the emission probabilities to some sensible set of values
for(Iterator i = DNA.iterator(); i.hasNext(); ) {
  rA[0] = (Residue) i.next();
  rA[1] = gap;
  i1.setWeight(pairDNA.getResidue(rL), some_suitable_score);

  for(Iterator j = DNA.iterator(); j.hasNext(); ) {
    rA[1] = (Residue) j.next();
   match.setWeight(pairDNA.getResidue(rL), some_suitable_score);
  }

  rA[1] = rA[0];
  rA[0] = gap;
  i2.setWeight(pairDNA.getResidue(rL), some_suitable_score);
}

DotState joiner = new SimpleDotState("Joiner");

// create the model
SimpleMarkovModel pairwise = new SimpleMarkovModel(
  2, pairDNA
);

// add the states
pairwise.addState(i1);
pairwise.addState(i2);
pairwise.addState(match);
pairwise.addState(joiner);

// wire joiner to/from the beginning/end state
pairwise.createTransition(pairwise.magicalState(), joiner);
pairwise.setTransitionScore(pairwise.magicalState(), joiner, 0);
pairwise.createTransition(joiner, pairwise.magicalState());
pairwise.setTransitionScore(joiner, pairwise.magicalState(), end);

// wire joiner to the match state
pairwise.createTransition(joiner, match);
pairwise.setTransitionScore(joiner, match, matchPrior);
pairwise.createTransition(match, match);
pairwise.setTransitionScore(match, match, matchExcent);
paiwrise.createTransition(match, joiner);
pairwise.setTransitionScore(match, joiner, matchEnd);

// wire joiner to i1
paiwrise.createTransition(joiner, i1);
pairwise.setTransitionScore(joiner, i1, gapPenalty);
pairwise.createTransition(i1, i1);
pairwise.setTransitionScore(i1, i1, gapExtend);
paiwrise.createTransition(i1, joiner);
pairwise.setTransitionScore(i1, joiner, gapEnd);

//wire joiner to i2
pairwise.createTransition(joiner, i2);
pairwise.setTransitionScore(joiner, i2, gapPenalty);
pairwise.createTransition(i2, i2);
pairwise.setTransitionScore(i2, i2, gapExtend);
paiwrise.createTransition(i2, joiner);
pairwise.setTransitionScore(i2, joiner, gapEnd);

// create the DP object that implements this algorithm
DP pairwiseDP = DPFactory.createDP(pairwise);

// now perform an alignment
// you can re-use the DP object many times
StatePath sp = pairwiseDP.viterbi(new ResidueList [] { seq1, seq2 });

Of course, you need to figure out what your probabilities/lods are going to be for
the emission probabilities & also the transition scores. I always end up working
with probabilities estimated from training data so I haven't yet delved into the
magic of blosum & the like.

Have fun

Matthew

Joon: You're out of your tree
Sam:  It wasn't my tree
                                                 (Benny & Joon)