<div dir="ltr">Sorry, by 'half gaps' I thought you meant a specific thing, like, "look at that half gap!". I guess you mean, columns where half of the values in the column are gaps.<div><br></div><div>Looking directly at the alignment, I guess there is no way to work out precisely which columns are gaps or not in the model?</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 15 Feb 2024 at 15:47, Dan Bolser <<a href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi Peter,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 15 Feb 2024 at 15:28, Peter Cock <<a href="mailto:p.j.a.cock@googlemail.com" target="_blank">p.j.a.cock@googlemail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Dan,<br>
<br>
It that is a probability at the end, you have the wrong denominator -<br>
should be len(align) == sum(amino_acids.values()<br></blockquote><div><br></div><div>Makes no difference, they are the same (including -s). The profile reports -s as 'Insert Probability'.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Note this seed alignment has 164 columns, yet the model page says the<br>
model has 95 columns. If I count the number of columns which are half<br>
gaps that's pretty close...<br></blockquote><div><br></div><div>I don't understand what that means or implies... Is something wrong somewhere?</div><div><br></div><div><br></div><div>Cheers,</div><div>Dan.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Peter<br>
<br>
On Thu, Feb 15, 2024 at 3:09 PM Dan Bolser <<a href="mailto:dan.bolser@outsee.co.uk" target="_blank">dan.bolser@outsee.co.uk</a>> wrote:<br>
><br>
> Hi,<br>
><br>
> Sorry, I can't follow the docs (or find the right docs).<br>
><br>
> I've got the 'seed' stockholm alignment for this domain:<br>
> <a href="https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed" rel="noreferrer" target="_blank">https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed</a><br>
><br>
> and I'm trying to reproduce the signature it shows here:<br>
> <a href="https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/" rel="noreferrer" target="_blank">https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/</a><br>
><br>
> I'm not sure a) why the probabilities differ in the profile relative to the seed alignment, or b) how to filter columns in the alignment by those that have a match in the model (see columns 4-6 in the alignment, which are gaps in the model).<br>
><br>
> I think if I can answer b) then the answer to a) will be, "look at the full alignment".<br>
><br>
> Here is my crude 'best guess' code:<br>
><br>
> import gzip<br>
> import Bio.AlignIO<br>
><br>
> # msa = "PF08241.alignment.full.gz"<br>
> msa = "PF08241.alignment.seed.gz"<br>
><br>
> with gzip.open(msa, "rt") as handle:<br>
> align = Bio.AlignIO.read(handle, "stockholm")<br>
> ncols = align.get_alignment_length()<br>
><br>
> for col in range(ncols):<br>
> amino_acids = dict()<br>
> for s in align[:, col]:<br>
> amino_acids[s] = amino_acids.get(s, 0) + 1<br>
> print(amino_acids)<br>
> for s in amino_acids:<br>
> print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")<br>
><br>
><br>
><br>
> I have the feeling I'm doin it rong...<br>
><br>
> The above is just a 'warm up', really I want to see the conservation score, base by base on a given protein in the alignment (where it matches the model).<br>
><br>
> Many thanks for any suggestions, and sorry for not being able to find the right document to answer these questions.<br>
><br>
><br>
> kthxbi,<br>
> Dan.<br>
> _______________________________________________<br>
> Biopython mailing list - <a href="mailto:Biopython@biopython.org" target="_blank">Biopython@biopython.org</a><br>
> <a href="https://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
</blockquote></div></div>
</blockquote></div>