<html><head></head><body><div class="ydpce4c934eyahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:10px;"><div></div>
<div dir="ltr" data-setdir="false">You could use the new parser in Bio.Align:</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><div dir="ltr" data-setdir="false"><span><span><span><span><span>>>> </span></span></span></span></span>import gzip</div><div dir="ltr" data-setdir="false"><span><span><span><span><span>>>> </span></span></span></span></span>from Bio import Align</div><div dir="ltr" data-setdir="false"><div dir="ltr" data-setdir="false"><span><span><span><span><span>>>> </span></span></span></span></span>from Bio.motifs import Motif<br><div dir="ltr" data-setdir="false"><span><span><span><span><span>>>> </span></span></span></span></span>from Bio.Data import IUPACData<br></div></div></div><div dir="ltr" data-setdir="false"><span><span><span><span><span><span>>>> </span></span></span></span></span>stream = gzip.open("PF08241.alignment.seed.gz", "rt")</span></div><div dir="ltr" data-setdir="false"><span><span><span><span><span><span><span>>>> </span></span></span></span></span>alignment = Align.read(stream, "Stockholm")</span></span></div><div dir="ltr" data-setdir="false"><span><span><span><span><span><span><span><span>>>> </span></span></span></span></span>indices = [index for index, c in enumerate(alignment.column_annotations['consensus sequence']) if c!="."]</span></span></span></div><div dir="ltr" data-setdir="false"><span><span><span><span>>>> alignment = alignment[:, indices]</span></span></span></span></div><div dir="ltr" data-setdir="false"><span><span><div dir="ltr" data-setdir="false">>>> m.consensus<br>Seq('LDVGCGTGLLTRALARLGARVTGVDLSPEMLELARERAPRAVVGDAEDLPFPDN...LII')<br><div>>>> m.weblogo("PF08241.png")<br><br></div></div><div dir="ltr" data-setdir="false">-Michiel<br></div></span></span></div><br></div><div><br></div>
</div><div id="yahoo_quoted_8854620181" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Friday, February 16, 2024 at 07:59:14 PM GMT+9, Dan Bolser <dan.bolser@outsee.co.uk> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv4136689180"><div><div dir="ltr">I don't suppose there is a package for reading in the HMM? 😅<div><br clear="none"></div></div><br clear="none"><div id="yiv4136689180yqt23012" class="yiv4136689180yqt8247348347"><div class="yiv4136689180gmail_quote"><div dir="ltr" class="yiv4136689180gmail_attr">On Thu, 15 Feb 2024 at 16:18, Dan Bolser <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:dan.bolser@outsee.co.uk" target="_blank" href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br clear="none"></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;" class="yiv4136689180gmail_quote"><div dir="ltr">Recreating the profile was just a sanity check. Really I want to look at conservation across the protein of interest.</div><br clear="none"><div class="yiv4136689180gmail_quote"><div dir="ltr" class="yiv4136689180gmail_attr">On Thu, 15 Feb 2024 at 16:13, Peter Cock <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:p.j.a.cock@googlemail.com" target="_blank" href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>> wrote:<br clear="none"></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;" class="yiv4136689180gmail_quote">You may have more joy starting from the HMM for the model?<br clear="none">
<br clear="none">
<a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://www.ebi.ac.uk/interpro/wwwapi//entry/pfam/PF08241?annotation=hmm">https://www.ebi.ac.uk/interpro/wwwapi//entry/pfam/PF08241?annotation=hmm</a><br clear="none">
<br clear="none">
Peter<br clear="none">
<br clear="none">
On Thu, Feb 15, 2024 at 4:06 PM Dan Bolser <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:dan.bolser@outsee.co.uk" target="_blank" href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br clear="none">
><br clear="none">
> Searching HMMER gives me (cryptic?) details of the model match:<br clear="none">
><br clear="none">
> Family-Id Family-Accession Clan Start End Ali-Start Ali-End Model-Start Model-End Bit-Score Ind.-E-value Cond.-E-value Description<br clear="none">
> ==================================================================================================================================<br clear="none">
><br clear="none">
> Methyltransf_11 PF08241.15 CL0063 63 173 63 172 1 95 51.30 1.5e-13 3.1e-17 Methyltransferase domain<br clear="none">
><br clear="none">
> ---------------------------------------------------------------------------------------------------------------------------------<br clear="none">
><br clear="none">
> MODEL LdvGcGtGrlaealakrg.arvvgvDlskemlklakekaseeglkvefvvadaeklpfednsfDlvvssevlhhv...e............dpekalkeiaRvLkpgGllv<br clear="none">
> MATCH L +GcG+ +l+ +l g +v+ vD+s ++++ +++ + ++++ +d++kl f+++sfD+v+ + +l+ + e ++++l+e+ RvL pgG+++<br clear="none">
> PPL 569************7779*************77766666666.69*****************************8662667778888888******************97<br clear="none">
> SEQ LVLGCGNSALSYELFLGGfPNVTSVDYSSVVVAAMQARHAHVP-QLRWETMDVRKLDFPSASFDVVLEKGTLDALlagErdpwtvssegvhTVDQVLSEVSRVLVPGGRFI<br clear="none">
><br clear="none">
><br clear="none">
> So I guess it's possible to work backwards from there...<br clear="none">
><br clear="none">
><br clear="none">
><br clear="none">
> On Thu, 15 Feb 2024 at 16:02, Peter Cock <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:p.j.a.cock@googlemail.com" target="_blank" href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>> wrote:<br clear="none">
>><br clear="none">
>> Sorry yes, I used this as a threshold for if a column was gappy, and<br clear="none">
>> likely to be in the model or not:<br clear="none">
>><br clear="none">
>> amino_acids.get("-", 0) < len(align)*0.5<br clear="none">
>><br clear="none">
>> Might need to use <= to match, or perhaps they use a more<br clear="none">
>> sophisticated criteria.<br clear="none">
>><br clear="none">
>> Peter<br clear="none">
>><br clear="none">
>> On Thu, Feb 15, 2024 at 3:56 PM Dan Bolser <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:dan.bolser@outsee.co.uk" target="_blank" href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br clear="none">
>> ><br clear="none">
>> > Sorry, by 'half gaps' I thought you meant a specific thing, like, "look at that half gap!". I guess you mean, columns where half of the values in the column are gaps.<br clear="none">
>> ><br clear="none">
>> > Looking directly at the alignment, I guess there is no way to work out precisely which columns are gaps or not in the model?<br clear="none">
>> ><br clear="none">
>> ><br clear="none">
>> ><br clear="none">
>> ><br clear="none">
>> ><br clear="none">
>> ><br clear="none">
>> > On Thu, 15 Feb 2024 at 15:47, Dan Bolser <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:dan.bolser@outsee.co.uk" target="_blank" href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br clear="none">
>> >><br clear="none">
>> >> Hi Peter,<br clear="none">
>> >><br clear="none">
>> >> On Thu, 15 Feb 2024 at 15:28, Peter Cock <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:p.j.a.cock@googlemail.com" target="_blank" href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>> wrote:<br clear="none">
>> >>><br clear="none">
>> >>> Hello Dan,<br clear="none">
>> >>><br clear="none">
>> >>> It that is a probability at the end, you have the wrong denominator -<br clear="none">
>> >>> should be len(align) == sum(amino_acids.values()<br clear="none">
>> >><br clear="none">
>> >><br clear="none">
>> >> Makes no difference, they are the same (including -s). The profile reports -s as 'Insert Probability'.<br clear="none">
>> >><br clear="none">
>> >><br clear="none">
>> >>> Note this seed alignment has 164 columns, yet the model page says the<br clear="none">
>> >>> model has 95 columns. If I count the number of columns which are half<br clear="none">
>> >>> gaps that's pretty close...<br clear="none">
>> >><br clear="none">
>> >><br clear="none">
>> >> I don't understand what that means or implies... Is something wrong somewhere?<br clear="none">
>> >><br clear="none">
>> >><br clear="none">
>> >> Cheers,<br clear="none">
>> >> Dan.<br clear="none">
>> >><br clear="none">
>> >><br clear="none">
>> >>> Peter<br clear="none">
>> >>><br clear="none">
>> >>> On Thu, Feb 15, 2024 at 3:09 PM Dan Bolser <<a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:dan.bolser@outsee.co.uk" target="_blank" href="mailto:dan.bolser@outsee.co.uk">dan.bolser@outsee.co.uk</a>> wrote:<br clear="none">
>> >>> ><br clear="none">
>> >>> > Hi,<br clear="none">
>> >>> ><br clear="none">
>> >>> > Sorry, I can't follow the docs (or find the right docs).<br clear="none">
>> >>> ><br clear="none">
>> >>> > I've got the 'seed' stockholm alignment for this domain:<br clear="none">
>> >>> > <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed">https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/entry_alignments/?type=seed</a><br clear="none">
>> >>> ><br clear="none">
>> >>> > and I'm trying to reproduce the signature it shows here:<br clear="none">
>> >>> > <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/">https://www.ebi.ac.uk/interpro/entry/pfam/PF08241/logo/</a><br clear="none">
>> >>> ><br clear="none">
>> >>> > I'm not sure a) why the probabilities differ in the profile relative to the seed alignment, or b) how to filter columns in the alignment by those that have a match in the model (see columns 4-6 in the alignment, which are gaps in the model).<br clear="none">
>> >>> ><br clear="none">
>> >>> > I think if I can answer b) then the answer to a) will be, "look at the full alignment".<br clear="none">
>> >>> ><br clear="none">
>> >>> > Here is my crude 'best guess' code:<br clear="none">
>> >>> ><br clear="none">
>> >>> > import gzip<br clear="none">
>> >>> > import Bio.AlignIO<br clear="none">
>> >>> ><br clear="none">
>> >>> > # msa = "PF08241.alignment.full.gz"<br clear="none">
>> >>> > msa = "PF08241.alignment.seed.gz"<br clear="none">
>> >>> ><br clear="none">
>> >>> > with gzip.open(msa, "rt") as handle:<br clear="none">
>> >>> > align = Bio.AlignIO.read(handle, "stockholm")<br clear="none">
>> >>> > ncols = align.get_alignment_length()<br clear="none">
>> >>> ><br clear="none">
>> >>> > for col in range(ncols):<br clear="none">
>> >>> > amino_acids = dict()<br clear="none">
>> >>> > for s in align[:, col]:<br clear="none">
>> >>> > amino_acids[s] = amino_acids.get(s, 0) + 1<br clear="none">
>> >>> > print(amino_acids)<br clear="none">
>> >>> > for s in amino_acids:<br clear="none">
>> >>> > print(f"{s}: {amino_acids[s]:3d} {amino_acids[s] / len(align):.3f}")<br clear="none">
>> >>> ><br clear="none">
>> >>> ><br clear="none">
>> >>> ><br clear="none">
>> >>> > I have the feeling I'm doin it rong...<br clear="none">
>> >>> ><br clear="none">
>> >>> > The above is just a 'warm up', really I want to see the conservation score, base by base on a given protein in the alignment (where it matches the model).<br clear="none">
>> >>> ><br clear="none">
>> >>> > Many thanks for any suggestions, and sorry for not being able to find the right document to answer these questions.<br clear="none">
>> >>> ><br clear="none">
>> >>> ><br clear="none">
>> >>> > kthxbi,<br clear="none">
>> >>> > Dan.<br clear="none">
>> >>> > _______________________________________________<br clear="none">
>> >>> > Biopython mailing list - <a rel="nofollow noopener noreferrer" shape="rect" ymailto="mailto:Biopython@biopython.org" target="_blank" href="mailto:Biopython@biopython.org">Biopython@biopython.org</a><br clear="none">
>> >>> > <a rel="nofollow noopener noreferrer" shape="rect" target="_blank" href="https://mailman.open-bio.org/mailman/listinfo/biopython">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br clear="none">
</blockquote></div>
</blockquote></div></div>
</div></div><div class="yqt8247348347" id="yqt57419">_______________________________________________<br clear="none">Biopython mailing list - <a shape="rect" ymailto="mailto:Biopython@biopython.org" href="mailto:Biopython@biopython.org">Biopython@biopython.org</a><br clear="none"><a shape="rect" href="https://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br clear="none"></div></div>
</div>
</div></body></html>