[Biopython] Calculating NS and S over a given sequence

Riesgo Ferreiro, Pablo Pablo.RiesgoFerreiro at TrOn-Mainz.DE
Wed Oct 6 03:53:35 EDT 2021


Thanks Zheng, I will look into the details of the implementation.


What I am in need is something like cal_n_s(ref_deq).  I will check if it makes any sense to make this a public function.



Best,

Pablo

________________________________
From: Zheng Ruan <zruan1991 at gmail.com>
Sent: 30 September 2021 15:32:38
To: Riesgo Ferreiro, Pablo
Cc: biopython at biopython.org
Subject: Re: [Biopython] Calculating NS and S over a given sequence

Hi Pablo,

You can simply use cal_dn_ds(ref_seq, sample_seq) to achieve this. If you have multiple sample_seqs, you may iterate all of them.

Internally, cal_dn_ds determines the N and S sites by averaging the N and S sites counted from both the ref_seq and sample seq. If you specify the NG86 method, it does the log transform as you show in the figure.

Best,
Zheng

On Thu, Sep 30, 2021 at 5:24 AM Riesgo Ferreiro, Pablo <Pablo.RiesgoFerreiro at tron-mainz.de<mailto:Pablo.RiesgoFerreiro at tron-mainz.de>> wrote:

Hi all,





I am new to this mailing list. First of all many thanks for your work, I have happily used Biopython in several projects before.



I have a need to compute the dN/dS ratio over a set of samples of the same species. I know this is not great 10.1371/journal.pgen.1000304, but still. I have found this feature in biopython calculating the dN/dS between sequences: https://biopython.org/docs/1.76/api/Bio.codonalign.codonseq.html#Bio.codonalign.codonseq.cal_dn_ds, but this does not cover my needs.



What I need is to compute dN/dS based on the count of mutations over a set of samples as explained at https://bioinformatics.cvr.ac.uk/calculating-dnds-for-ngs-datasets/



[cid:7c03806e-bbb0-47b1-9c49-3c53e33af83e]



N and S is dependent on the reference sequence and independent on the samples. N and S can be calculated on different genomic regions (eg: coding region, transcript, exon, domain, etc.). The simplest input for this tool would be a given ORF sequence and you would think of more complete things as a GFF file.



It is a small thing, but unless anyone knows of an existing implementation, I think it may be useful to others. Do you think this would be a valuable contribution to biopython?







Best wishes,

Pablo Riesgo Ferreiro
Computational Medicine


TRON
Translationale Onkologie an der Universitätsmedizin der
Johannes Gutenberg-Universität Mainz gemeinnützige GmbH

_______________________________________________
Biopython mailing list  -  Biopython at biopython.org<mailto:Biopython at biopython.org>
https://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20211006/804bf152/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2021-09-30 09-55-58.png
Type: image/png
Size: 100309 bytes
Desc: Screenshot from 2021-09-30 09-55-58.png
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20211006/804bf152/attachment-0001.png>


More information about the Biopython mailing list