[Biojava-l] IndexOutOfBounds Exception when performing Pairwise Alignment

Hannes Brandstätter-Müller biojava at hannes.oib.com
Tue Dec 6 08:20:46 UTC 2011


On Tue, Dec 6, 2011 at 02:57, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Now, I'm getting null return value - must be still something wrong in
>> the parameters...
>>
>> Where should I start looking for that?
>
> try different gap penalties, I think the default ones are for protein
> alignments and one of the blosum matrices...
> If that does not help, can you send some of the sequences that are
> causing problems? There should be more informative error messages..

There are no other gap penalties predefined, and using a custom simple
gap penalty with (gop=1, gep=1) also does not change the null outcome.
Here is a unit test case that fails for me:

public void testPSA() {
        String targetSeq =
"CACGTTTCTTGTGGCAGCTTAAGTTTGAATGTCATTTCTTCAATGGGACGGA"
                +
"GCGGGTGCGGTTGCTGGAAAGATGCATCTATAACCAAGAGGAGTCCGTGCGCTTCGACAGC"
                +
"GACGTGGGGGAGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACA"
                +
"GCCAGAAGGACCTCCTGGAGCAGAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTA"
                + "CGGGGTTGGTGAGAGCTTCACAGTGCAGCGGCGAG";
        DNASequence target = new DNASequence(targetSeq,
AmbiguityDNACompoundSet.getDNACompoundSet());
        String querySeq =
"ACGAGTGCGTGTTTTCCCGCCTGGTCCCCAGGCCCCCTTTCCGTCCTCAGGAA"
                +
"GACAGAGGAGGAGCCCCTCGGGCTGCAGGTGGTGGGCGTTGCGGCGGCGGCCGGTTAAGGT"
                +
"TCCCAGTGCCCGCACCCGGCCCACGGGAGCCCCGGACTGGCGGCGTCACTGTCAGTGTCTT"
                +
"CTCAGGAGGCCGCCTGTGTGACTGGATCGTTCGTGTCCCCACAGCACGTTTCTTGGAGTAC"
                +
"TCTACGTCTGAGTGTCATTTCTTCAATGGGACGGAGCGGGTGCGGTTCCTGGACAGATACT"
                +
"TCCATAACCAGGAGGAGAACGTGCGCTTCGACAGCGACGTGGGGGAGTTCCGGGCGGTGAC"
                +
"GGAGCTGGGGCGGCCTGATGCCGAGTACTGGAACAGCCAGAAGGACATCCTGGAAGACGAG"
                +
"CGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGTGAGAGCTTCACCGTGCA"
                + "GCGGCGAGACGCACTCGT";
        DNASequence query = new DNASequence(querySeq);
        SubstitutionMatrix<NucleotideCompound> matrix =
SubstitutionMatrixHelper.getNuc4_4();
        SequencePair<DNASequence, NucleotideCompound> psa =
Alignments.getPairwiseAlignment(query, target,
PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix);
        assertNotNull(psa);
    }

>> Is there a simple way to align (or score, don't need the full
>> alignment) a single DNA sequence against a List of sequences?
>
> You could do a multiple sequence alignment.
> http://www.biojava.org/wiki/BioJava:CookBook3:MSA

yeah, but that also computes loads of unnecessary PSAs. I just need
the following:

I get some sequences (from a sequencing machine), and for each of
these sequences I want to look up in my (small) 'library' of reference
sequences which one would be the most likely. So, I don't want PSAs of
the reference sequences, just my query against each ref seq -
something like that should be in the biojava library itself, the only
thing I found was to calculate PSAs of eact sequence in a list (much
like you need for a MSA), but if biuojava could offer that using the
ConcurrencyTools stuff, that would be cool - I really need to figure
out the inner structure of the biojava classes and start implementing
that stuff for myself, but the factory method stuff is kinda confusing
to get a hang of.

As soon as I figure this out, I'm going to improve the hell out of the
cookbook examples. Those are next to useless for my scenario.

Hannes



More information about the Biojava-l mailing list