[Biojava-l] Setting anchors in AnchoredPairwiseSequenceAligner

Daniel Cameron cameron.d at wehi.EDU.AU
Wed Nov 27 03:54:56 UTC 2013


That API does make sense but unfortunately, it crashes when I try it. 
Having traced through the source & test cases from github, it appears 
that there isn't actually any test coverage for the (anchor != null) 
path in align().

Additionally, align() also explicitly forces the final base of the query 
to align to the final base of the target which in my case is not 
supposed to be an anchor. I've copied my test cases with expected 
behaviour. Is there a bug track system I should be raising this with?

Cheers
Daniel

     @Test
     public void testNeedlemanWunschDNAAlignment() {
         DNASequence query = new DNASequence("ACGTAACCGGTT", 
AmbiguityDNACompoundSet.getDNACompoundSet());
         DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", 
AmbiguityDNACompoundSet.getDNACompoundSet());
         // 123456789012345678900
         // Expected:                           |----------|
         NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new 
NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target, new 
SimpleGapPenalty((short)5, (short)2), SubstitutionMatrixHelper.getNuc4_4());
         AlignedSequence<DNASequence, NucleotideCompound> aligned = 
nw.getPair().getQuery();
         assertEquals(2, (int)aligned.getStart().getPosition());
         assertEquals(13, (int)aligned.getEnd().getPosition());
     }
     @Test
     public void testAnchoredDNAAlignment() {
         DNASequence query = new DNASequence("ACGTAACCGGTT", 
AmbiguityDNACompoundSet.getDNACompoundSet());
         DNASequence target = new DNASequence("AACGTAACCGGTTACGTACGT", 
AmbiguityDNACompoundSet.getDNACompoundSet());
         // 123456789012345678900
         // Expected:                          |_----------|
         AnchoredPairwiseSequenceAligner<DNASequence, 
NucleotideCompound> aligner = new 
AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>(query, 
target, new SimpleGapPenalty((short)5, (short)2), 
SubstitutionMatrixHelper.getNuc4_4());
         int[] anchors = new int[query.getLength()];
         for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
         anchors[0] = 0;
         AlignedSequence<DNASequence, NucleotideCompound> aligned = 
aligner.getPair().getQuery();
         assertEquals(1, (int)aligned.getStart().getPosition());
         assertEquals(13, (int)aligned.getEnd().getPosition());
     }

On 27/11/2013 12:00 PM, Andreas Prlic wrote:
> Hi Daniel,
>
> Clearly we need some documentation for this.
>
> Looking at the source: if you get to AbstractMatrixAligner.align() you 
> can see how the anchors are being used.
>
> I just took a brief look, so I might be wrong, but I think the int[] 
> array should have the length of the query sequence and each position 
> indicates the counterpart in the target sequence. Positions that are 
> <=0 should be considered not to be anchored. If this is right, then 
> this should be very close to what you were expecting.
>
> Can you take a closer look and confirm if this works for you?
>
> Thanks,
>
> Andreas
>
> On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron 
> <cameron.d at wehi.edu.au <mailto:cameron.d at wehi.edu.au>> wrote:
>
>     Hello all
>
>     I'm looking the BioJava API for anchored alignment and I'm unsure
>     of how to set alignment anchors. The API exposes int[]
>     get/setAnchor() which is confusing me somewhat and I'm unsure of
>     what a BioJava anchor actually is. My use case is the comparison
>     of two sequences which I have a priori knowledge of particular
>     positions so I was expecting an API where I could specify base
>     positions in the query to correspond different positions in the
>     target. Eg: I was query base 4 to align to target base 10 and
>     query base 100 to align to target base 80. Is this possible?
>
>     With anchor being only an int[] and the source looking like this
>     is not an array of paired values, how would one go about
>     performing the above alignment?
>
>
>     Thanks
>     Daniel Cameron
>
>     ______________________________________________________________________
>     The information in this email is confidential and intended solely
>     for the addressee.
>     You must not disclose, forward, print or use it without the
>     permission of the sender.
>     ______________________________________________________________________
>     _______________________________________________
>     Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>     <mailto:Biojava-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biojava-l
>


______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________



More information about the Biojava-l mailing list