<div dir="ltr">To make sure that this doesn't get forgotten, I created an issue for it on github:<br><br><a href="https://github.com/biojava/biojava/issues/288">https://github.com/biojava/biojava/issues/288</a><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 17, 2015 at 10:08 AM, Jose Manuel Duarte <span dir="ltr"><<a href="mailto:jose.duarte@psi.ch" target="_blank">jose.duarte@psi.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Stefan<br>
<br>
Just a couple of comments, but not much direct help. <br>
<br>
From the source code I can see that the multiple alignment proceeds
in 4 steps: 1) pairwise alignments for all pairs, 2) hierarchical
clustering into a guide tree, 3) progressive alignment and 4)
refinement. However the refinement step doesn't seem to be
implemented yet (there's a TODO in the code). That might explain the
poorer result.<br>
<br>
Another thing to take into account is that there are a couple of
known bugs in pairwise alignments at the moment:<br>
<br>
<a href="https://github.com/biojava/biojava/issues/274" target="_blank">https://github.com/biojava/biojava/issues/274</a><br>
<br>
<a href="https://github.com/biojava/biojava/issues/213" target="_blank">https://github.com/biojava/biojava/issues/213</a><br>
<br>
From those, #213 may have some relation to the problem you are
seeing, but it's hard to tell.<span class="HOEnZb"><font color="#888888"><br>
<br>
Jose</font></span><div><div class="h5"><br>
<br>
<br>
<div>On 17.06.2015 03:07, stefan harjes
wrote:<br>
</div>
</div></div><blockquote type="cite"><div><div class="h5">
<div style="color:#000;background-color:#fff;font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px">
<div>Hi biojava,</div>
<div><br>
</div>
<div dir="ltr">I am
fighting with the multiple alignment of several DNASequences.
When I use the biojava computation I get alignments errors
regarding the gaps. Clustalx computes a much better result in
comparison:</div>
<div dir="ltr"><br>
</div>
<div dir="ltr">biojava<br>
</div>
<div dir="ltr">TTGGGGCCTCTAAACGGGGTCTT<br>
TTGGGGC-TCTAAC--GGGTCTT<br>
TTGGGGCCTCTAAACGGG-TCTT<br>
<br>
clustal<br>
TTGGGGCCTCTAAACGGGGTCTT<br>
TTGGGG-CTCT-AACGGG-TCTT<br>
TTGGGGCCTCTAAACGGG-TCTT<br>
****** **** ****<br>
</div>
<div>The most important
difference is the second gap in the middle sequence, which is
obviously better aligned in clustal. Any hints as to how to
improve the biojava parameters/algorithms? <br>
</div>
<div><br>
</div>
<div>Cheers</div>
<div>Stefan</div>
<div>p.s.<br>
</div>
<div dir="ltr">I already
tried to implement the actual gapPenalty which clustal uses
which is 10/.1 for the pairwise and 10/.2 for the multiple
alignment. (i.e. I changed all java short types to int, scaled
all scoring parameters including the matrix by 10 and
implemented two different gapPenalties in the two alignments).
Unfortunately this does not change anything. <br>
</div>
<div dir="ltr">Does any
of you guys have a copy of the IUB scoring matrix? which would
be my next try?</div>
<div dir="ltr"><br>
</div>
<div dir="ltr"> <br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><span class=""><pre>_______________________________________________
biojava-dev mailing list
<a href="mailto:biojava-dev@mailman.open-bio.org" target="_blank">biojava-dev@mailman.open-bio.org</a>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a></pre>
</span></blockquote>
<br>
</div>
<br>_______________________________________________<br>
biojava-dev mailing list<br>
<a href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev" rel="noreferrer" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br></blockquote></div><br></div>