[Biojava-l] Setting anchors in AnchoredPairwiseSequenceAligner

Daniel Cameron cameron.d at wehi.EDU.AU
Wed Nov 27 05:28:17 UTC 2013


I was using the (passing) NW test as a baseline to compare the (failing) 
anchored test to so as to make sure I was using the BioJava API 
correctly. In my particular use case, I want to perform global alignment 
with only the first base anchored but unfortunately, NW does not expose 
get/set to anchor (and the meaning of anchoring a base for SW/local 
alignment is a bit ambiguous).

A bug has been raised with a test case for ease of reproduction.

Cheers
Daniel

On 27/11/2013 3:34 PM, Andreas Prlic wrote:
> needleman -wunsch is doing a global alignment, try smith waterman 
> instead.
>
> can you file a bug report on github? 
> https://github.com/biojava/biojava/issues
>
> Thanks,
>
> Andreas
>
>
> On Tue, Nov 26, 2013 at 7:54 PM, Daniel Cameron <cameron.d at wehi.edu.au 
> <mailto:cameron.d at wehi.edu.au>> wrote:
>
>     That API does make sense but unfortunately, it crashes when I try
>     it. Having traced through the source & test cases from github, it
>     appears that there isn't actually any test coverage for the
>     (anchor != null) path in align().
>
>     Additionally, align() also explicitly forces the final base of the
>     query to align to the final base of the target which in my case is
>     not supposed to be an anchor. I've copied my test cases with
>     expected behaviour. Is there a bug track system I should be
>     raising this with?
>
>     Cheers
>     Daniel
>
>         @Test
>         public void testNeedlemanWunschDNAAlignment() {
>             DNASequence query = new DNASequence("ACGTAACCGGTT",
>     AmbiguityDNACompoundSet.getDNACompoundSet());
>             DNASequence target = new
>     DNASequence("AACGTAACCGGTTACGTACGT",
>     AmbiguityDNACompoundSet.getDNACompoundSet());
>             // 123456789012345678900
>             // Expected: |----------|
>             NeedlemanWunsch<DNASequence, NucleotideCompound> nw = new
>     NeedlemanWunsch<DNASequence, NucleotideCompound>(query, target,
>     new SimpleGapPenalty((short)5, (short)2),
>     SubstitutionMatrixHelper.getNuc4_4());
>             AlignedSequence<DNASequence, NucleotideCompound> aligned =
>     nw.getPair().getQuery();
>             assertEquals(2, (int)aligned.getStart().getPosition());
>             assertEquals(13, (int)aligned.getEnd().getPosition());
>         }
>         @Test
>         public void testAnchoredDNAAlignment() {
>             DNASequence query = new DNASequence("ACGTAACCGGTT",
>     AmbiguityDNACompoundSet.getDNACompoundSet());
>             DNASequence target = new
>     DNASequence("AACGTAACCGGTTACGTACGT",
>     AmbiguityDNACompoundSet.getDNACompoundSet());
>             // 123456789012345678900
>             // Expected: |_----------|
>     AnchoredPairwiseSequenceAligner<DNASequence, NucleotideCompound>
>     aligner = new AnchoredPairwiseSequenceAligner<DNASequence,
>     NucleotideCompound>(query, target, new SimpleGapPenalty((short)5,
>     (short)2), SubstitutionMatrixHelper.getNuc4_4());
>             int[] anchors = new int[query.getLength()];
>             for (int i = 0; i < anchors.length; i++) anchors[i] = -1;
>             anchors[0] = 0;
>             AlignedSequence<DNASequence, NucleotideCompound> aligned =
>     aligner.getPair().getQuery();
>             assertEquals(1, (int)aligned.getStart().getPosition());
>             assertEquals(13, (int)aligned.getEnd().getPosition());
>
>         }
>
>     On 27/11/2013 12:00 PM, Andreas Prlic wrote:
>>     Hi Daniel,
>>
>>     Clearly we need some documentation for this.
>>
>>     Looking at the source: if you get to
>>     AbstractMatrixAligner.align() you can see how the anchors are
>>     being used.
>>
>>     I just took a brief look, so I might be wrong, but I think the
>>     int[] array should have the length of the query sequence and each
>>     position indicates the counterpart in the target sequence.
>>     Positions that are <=0 should be considered not to be anchored.
>>     If this is right, then this should be very close to what you were
>>     expecting.
>>
>>     Can you take a closer look and confirm if this works for you?
>>
>>     Thanks,
>>
>>     Andreas
>>
>>     On Sun, Nov 24, 2013 at 11:22 PM, Daniel Cameron
>>     <cameron.d at wehi.edu.au <mailto:cameron.d at wehi.edu.au>> wrote:
>>
>>         Hello all
>>
>>         I'm looking the BioJava API for anchored alignment and I'm
>>         unsure of how to set alignment anchors. The API exposes int[]
>>         get/setAnchor() which is confusing me somewhat and I'm unsure
>>         of what a BioJava anchor actually is. My use case is the
>>         comparison of two sequences which I have a priori knowledge
>>         of particular positions so I was expecting an API where I
>>         could specify base positions in the query to correspond
>>         different positions in the target. Eg: I was query base 4 to
>>         align to target base 10 and query base 100 to align to target
>>         base 80. Is this possible?
>>
>>         With anchor being only an int[] and the source looking like
>>         this is not an array of paired values, how would one go about
>>         performing the above alignment?
>>
>>
>>         Thanks
>>         Daniel Cameron
>>
>>         ______________________________________________________________________
>>         The information in this email is confidential and intended
>>         solely for the addressee.
>>         You must not disclose, forward, print or use it without the
>>         permission of the sender.
>>         ______________________________________________________________________
>>         _______________________________________________
>>         Biojava-l mailing list  - Biojava-l at lists.open-bio.org
>>         <mailto:Biojava-l at lists.open-bio.org>
>>         http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>     ______________________________________________________________________
>     The information in this email is confidential and intended solely
>     for the addressee.
>     You must not disclose, forward, print or use it without the
>     permission of the sender.
>     ______________________________________________________________________
>
>
>
>



______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________



More information about the Biojava-l mailing list