<html><head></head><body><div class="ydpcc8aacfayahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:10px;"><div></div>
<div dir="ltr" data-setdir="false">> <span>I have lots of pairs of pre-aligned sequences (imported from an external MSA file),</span></div><div dir="ltr" data-setdir="false"><span><br></span></div><div dir="ltr" data-setdir="false"><span>In which format is your MSA file?</span></div><div dir="ltr" data-setdir="false"><span><br></span></div><div dir="ltr" data-setdir="false"><span>-Michiel<br></span></div><div><br></div>
</div><div id="yahoo_quoted_9753999089" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Thursday, January 30, 2025 at 11:59:33 PM GMT+9, Peter Cock <p.j.a.cock@googlemail.com> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv8603226725"><div dir="ltr">Hello all, and Michiel in particular,<br><br>I am wondering if any of the pairwise alignment code in Bio.Align (much of which is written in C for speed) could help with this use case?:<br><br>I have lots of pairs of pre-aligned sequences (imported from an external MSA file), for which I am doing something like this:<br><br>```python<br>def count_matches_etc(query_seq, subject_seq):<br> assert len(query_seq) == len(subject_seq), "Should be same length"<br> matches = non_gap_mismatches = either_gapped = both_gapped = 0<br> for q, s in zip(query_seq, subject_seq, strict=True):<br> if q == "-" and s == "-":<br> both_gapped += 1<br> elif q == "-" or s == "-":<br> either_gapped += 1<br> elif q == s:<br> matches += 1<br> else:<br> non_gap_mismatches += 1<br> assert matches + non_gap_mismatches + either_gapped + both_gapped == len(query_seq)<br> return matches, non_gap_mismatches, either_gapped, both_gapped<br><br><br># Test case<br>assert (9, 1, 2, 1) == count_matches_etc("ACGTAC-TAC-GT", "AGGT-CGTAC-GT")<br>```<br><br>Sticking with Python that could be optimized (e.g. I am currently using this with sequences of a million base pairs but few gaps), however I have written this example with clarity foremost in mind.<br><div><br></div><div>Thank you,</div><div><br></div>Peter</div>
</div>_______________________________________________<br>Biopython mailing list - <a ymailto="mailto:Biopython@biopython.org" href="mailto:Biopython@biopython.org">Biopython@biopython.org</a><br><a href="https://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br></div>
</div>
</div></body></html>