[Biojava-l] unwanted gap in alignments

Andrew Walsh drandrewwalsh at gmail.com
Fri Jan 14 16:22:55 UTC 2011


Changing the gap penalty isn't making a difference because both versions 
have the same number of gaps and gaps of the same length.  Penalizing 
end gaps might address the first example, but not the second.

Since the gaps are the same (from the point of view of how gaps are 
scored by the algorithms), what is actually driving the output is the 
substitution penalties.  In the PSA example, the preferred alignment has 
an 'R' substituted for a 'G', whereas the unwanted output has 'R' 
substituted for  'S'.  The latter is more common substitution since it 
is more conservative from the point of view of amino acid chemistry and 
may also require fewer mutations (although that depends on the codon 
usage for both 'R' and 'S').  Thus it will get a lower penalty, so most 
algorithms will prefer the unwanted PSA over your expected output.

A similar reasoning applies to the MSA example.  In the unwanted 
version, it is matching 'G' to 'G', which is not a substitution at all 
and thus gets a higher score than the 'V' to 'G' substitution required 
for the expected output.

Now, I can understand why, in the PSA example an end gap seems more 
likely than an internal gap, and in the MSA example one deletion event 
seems more likely than two similar but slightly different deletion 
events.  But the math of the traditional alignment algorithms just won't 
support those outputs.

Unfortunately, I don't have a good answer for how to make BioJava output 
your desired result.  But it is my hope that clarifying the problem 
might be a useful step in arriving at a solution.

Incidentally, does your desired output come directly from a particular 
alignment algorithm, or have they been hand-adjusted?

-Andy Walsh


On 1/14/2011 10:45 AM, Andreas Prlic wrote:
> looks a bit like an end-gap issue to me. I think the global alignment
> algorithm does not penalize end gaps. Try a local alignment (smith
> waterman) instead.
>
> Andreas
>
>
>
> On Fri, Jan 14, 2011 at 2:32 AM, Khalil El Mazouari
> <khalil.elmazouari at gmail.com>  wrote:
>> Hi All,
>>
>> I am testing the PSA and MSA examples from Cookbook3.
>>
>> Sometimes, gaps were introduced in "unwanted" places in the alignments. Ex. below:
>>
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCA-------------------R
>>
>> expected PSA was:
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEKGLEWIGRIDPASGNTKYDPKFQDKATITADTSSNTAYLQLSSLTSEDTAVYYCAGYDYGNFDYWGQGTTLTVSS
>> EVQLQQSGAELVKPGASVKLSCTASGFNIKDTYMHWVKQRPEQGLEWIGRIDPANGNTKYDPKFQGKATITADTSSNTAYLQLSSLTSEDTAVYYCAR-------------------
>>
>>
>> the same for MSA
>> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
>> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWV-----------------GRFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
>> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWV-----------------GRFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>>
>> expected MSA
>> DVQLVESGGGLVKPGGSLRLSCAASGFTFSTAWMKWVRQAPGKGLEWVVWRVEQVVEKAFANSVNGRFTISRNDSKNTLYLQMISVTPZBTAVYYCARVVVSTSMDVWGQGTPVT
>> EVQLVESGGGLVQPGGSLKLSCAASGFTFS-----WVRQASGKGLEWVG-----------------RFTISRDDSKNTAYLQMNSLKTEDTAVYYCTR-----------------
>> EVQLVESGGGLVQPGGSLRLSCAASGFTFS-----WVRQAPGKGLEWVG-----------------RFTISRDDSKNSLYLQMNSLKTEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>> QVQLVESGGGVVQPGRSLRLSCAASGFTFS-----WVRQAPGKGLEWVA-----------------RFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR-----------------
>>
>>
>> I have tested different gop/gep and LOCAL/GLOBAL PSA . No success!
>>
>> How can I force or avoid the gap creation at specific positions?
>>
>> Many thanks.
>>
>> Khalil
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list