<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body><div style="font-family:Helvetica,Arial;font-size:13px"><div>I guess there is some relationship because they both deal with alterations compared to a reference sequence.</div><div><br></div><div>To my knowledge, the Variant Call Format, or VCF, file is generally used in the context of an NGS experiment. Generating a vcf file is the next step after a bam file. The bam file contains the alignment of each sequencing read to the reference collection, then the vcf file summarizes the differences.</div><div><br></div><div>The HGVS mutation description is usually used in a more low-throughput context. So for example if you’re studying a disease known to be associated with mutations in a specific gene, then you might describe the mutations using the HGVS specification. </div><div><br></div><div>So, for example, cystic fibrosis is caused by the G542X mutation (i.e. Glycine 542 is changed to Termination) in the cystic fibrosis transmembrane regulator. If you go to the gnomad database and search for the IDS gene, you get a table with many variants of this gene that cause Hunter Syndrome, e.g.:</div><div><div>c.1650T>C</div><div>c.1648C>T</div><div>c.1645A>G</div><div>c.1644G>T</div><div>c.1642T>C</div><div>c.1637A>G</div><div>c.1636C>T</div><div>p.Pro550Pro</div><div>p.Pro550Ser</div><div>p.Met549Val</div><div>p.Leu548Phe</div><div>p.Leu548Leu</div><div>p.Gln546Arg</div><div>p.Gln546Ter</div><div>c.1181-32_1181-16dup</div><div>c.1181-83_1181-73del</div></div><div><br></div><div>There are 1608 rows in this table for the IDS gene.</div><div><br></div><div>If a new mutation is described in the literature it will (should be) specified in HGVS format. In many older papers that is not the case.</div><div><br></div><div>Some of the things you might want to do with these HGVS variant descriptions are:</div><div>1. Given the standard (i.e. reference) sequence for a gene and a variant, what is the sequence of the mutated gene?</div><div>2. Given the gene sequence and the HGVS description of the DNA change, what is the protein change?</div><div>3. Given just the protein change, what are the possible DNA changes that could cause it?</div><div>4. Given just the DNA change and reference sequence, is it a missense or nonsense mutation?</div><div>5. Given a variant description, is it consistent with the reference sequence? For example, in the CFTR case mentioned above G542X is a mutation found in the literature. If I am collecting data and I see a mutation described as T542X it is wrong. There is no T at position 542 of CFTR. I would determine that by checking the CFTR sequence.</div><div><br></div><div>In general, I think of VCF as part of a NGS workflow, while HGVS is used further downstream in structure-function and genotype-phenotype discussions.</div><div><br></div><div>I hope that helps clarify. </div><div><br></div><div>It would have helped me to find a biopython module that would instantiate classes and subclasses of mutations/variants and provide some basic methods. I know that there are other scientists asking the same sorts of questions, but I don’t know whether any are attempting to answer them by writing python programs. </div><div><br></div><div>Dave</div><div><br></div></div> <br> <div class="gmail_signature"></div> <br><p class="airmail_on">On November 1, 2023 at 3:36:16 PM, Peter Cock (<a href="mailto:p.j.a.cock@googlemail.com">p.j.a.cock@googlemail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div><div dir="ltr"><div>I don't think we have anything like this (yet). Are efforts like VCF (variant call format) related but separate in your mind?<br></div><div><br></div><div>Peter<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Oct 31, 2023 at 7:31 PM David Merberg <<a href="mailto:merbergd@gmail.com">merbergd@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg3959824987974020465"><div><div style="font-family:Helvetica,Arial;font-size:13px">Hello biopython world,</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">For my last job, I wrote some python code to categorize and describe sequence changes of many types. I used biopython to handle sequences and some basic functions like IO and translation, but I did not find a module for reading variants/mutants and applying them to sequences.</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">Some cases are trivial, but some are not. For example, a small deletion in the nucleotide sequence may have no effect on the amino acid corresponding to the position of the affected codon, but will affect downstream amino acids. Protein changes caused by deletions or insertions of 3, 6, 9 . . . nucleotides can also be tricky to calculate.</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">My question is whether there is a biopython module to read variants in a standard format (see for example <a href="http://varnomen.hgvs.org/" target="_blank">http://varnomen.hgvs.org/</a>)? Along with the variant objects there could be a set of methods to operate on mutated sequences. Does the community think that this would be useful if it does not already exist?</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">I implemented many functions for these sorts of operations, but I realized soon afterwards that there are probably better ways to do much of it. I always wanted to redo the work, but never had time. Now I have time, but am not at that job. If it would be useful to the community, I may be able to take it on as a contribution to biopython.</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">A caveat is that I don’t have experience contributing to multi-developer projects. I try to write clean, well documented code and I’m familiar with the basics of git. So, it’s OK if you’d prefer that I start with something smaller (like unit tests or documentation). Just let me know.</div><div style="font-family:Helvetica,Arial;font-size:13px"><br></div><div style="font-family:Helvetica,Arial;font-size:13px">Dave Merberg</div><br><div class="gmail_signature"></div></div>
_______________________________________________<br>
Biopython mailing list - <a href="mailto:Biopython@biopython.org" target="_blank">Biopython@biopython.org</a><br>
<a href="https://mailman.open-bio.org/mailman/listinfo/biopython" rel="noreferrer" target="_blank">https://mailman.open-bio.org/mailman/listinfo/biopython</a><br>
</div></blockquote></div>
</div></div></span></blockquote></body></html>