[Biojava-l] comparison of the pairwise aligner to emboss' needle
Andreas Prlic
andreas at sdsc.edu
Mon Apr 18 18:34:57 UTC 2011
Hi Wim,
thanks for tracking this down. I agree, something does not look right
here. I'll try to see what is going on...
Andreas
On Mon, Apr 18, 2011 at 8:22 AM, Wim De Smet <Wim.DeSmet at ugent.be> wrote:
> Hi all,
>
> I've been trying to generate some global alignments with biojava and
> comparing them with what needle returns. Doing this, I can't seem to
> reproduce needle's alignment with biojava. The score returned from biojava
> seems to be worse than that from needle, so I'm not sure what's happening
> here.
>
> The sequences are AB004720 and Y17238 (I didn't attach a fasta file to avoid
> spamming people, let me know if you want one). I align them with:
> GapPenalty penalty = new SimpleGapPenalty((short)-14, (short)-4);
> PairwiseSequenceAligner<DNASequence, NucleotideCompound> aligner =
> Alignments.getPairwiseAligner(
> new DNASequence(query, AmbiguityDNACompoundSet.getDNACompoundSet()),
> new DNASequence(target, AmbiguityDNACompoundSet.getDNACompoundSet()),
> PairwiseSequenceAlignerType.GLOBAL,
> penalty, SubstitutionMatrixHelper.getNuc4_4());
> SequencePair<DNASequence, NucleotideCompound>
> alignment = aligner.getPair();
>
> This gives me an alignment with only 23% similarity and a gap at the end.
> Varying the gap penalties can give me a gap in front too, but that's about
> it. When aligning in needle, I get a sequence with a higher score (6784 vs
> (-)5862) and 94% similarity (which seems closer to home). Needle I just run
> with defaults (so it uses EDNAFULL) and a go/ge of 14/4.
>
> Could this be a bug or am I misunderstanding some of the options?
>
> BTW, if I use a really large gapextend, say -4000, I also get a nullpointer
> exception.
>
> TIA,
> Wim De Smet
>
> --
> Wim De Smet
> http://www.straininfo.net/
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
--
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------
More information about the Biojava-l
mailing list