[Biopython] Calculating NS and S over a given sequence

Riesgo Ferreiro, Pablo Pablo.RiesgoFerreiro at TrOn-Mainz.DE
Wed Oct 6 09:40:30 EDT 2021


Many thanks for pointing me in a good direction Zheng Ruan, I was able to get what I need using some of the protected methods as follows:


from Bio.codonalign.codonseq import CodonSeq
from Bio.codonalign import codonseq
from Bio.Data import CodonTable


def calculate_ns_s_2(sequence: Seq) -> Tuple[int, int]:
codon_sequence = CodonSeq(str(sequence))
s, ns = codonseq._count_site_NG86(
codon_lst=codonseq._get_codon_list(codon_sequence)[0:-1],
codon_table=CodonTable.standard_dna_table
)
return ns, s

I had already put together a script for this purpose and I confirmed two things: 1) the results are the same with one exception and 2 ) your implementation is way faster :)
The exception is the stop codon, I had to remove it from the input sequence. So, I was wondering: should not the changes in a stop codon be taken into account for the dN/dS? I guess the impact on the dN/dS would be minimal, but what do you think?

On a different line, I am happy to use these protected methods for my purposes. But it is not so easy to use, would it be interesting to make this API a bit more accessible and documented? I would be happy to spend some time on that, but only if you consider it relevant.


Best,

Pablo

________________________________
From: Zheng Ruan <zruan1991 at gmail.com>
Sent: 30 September 2021 15:32:38
To: Riesgo Ferreiro, Pablo
Cc: biopython at biopython.org
Subject: Re: [Biopython] Calculating NS and S over a given sequence

Hi Pablo,

You can simply use cal_dn_ds(ref_seq, sample_seq) to achieve this. If you have multiple sample_seqs, you may iterate all of them.

Internally, cal_dn_ds determines the N and S sites by averaging the N and S sites counted from both the ref_seq and sample seq. If you specify the NG86 method, it does the log transform as you show in the figure.

Best,
Zheng

On Thu, Sep 30, 2021 at 5:24 AM Riesgo Ferreiro, Pablo <Pablo.RiesgoFerreiro at tron-mainz.de<mailto:Pablo.RiesgoFerreiro at tron-mainz.de>> wrote:

Hi all,





I am new to this mailing list. First of all many thanks for your work, I have happily used Biopython in several projects before.



I have a need to compute the dN/dS ratio over a set of samples of the same species. I know this is not great 10.1371/journal.pgen.1000304, but still. I have found this feature in biopython calculating the dN/dS between sequences: https://biopython.org/docs/1.76/api/Bio.codonalign.codonseq.html#Bio.codonalign.codonseq.cal_dn_ds, but this does not cover my needs.



What I need is to compute dN/dS based on the count of mutations over a set of samples as explained at https://bioinformatics.cvr.ac.uk/calculating-dnds-for-ngs-datasets/



[cid:7c03806e-bbb0-47b1-9c49-3c53e33af83e]



N and S is dependent on the reference sequence and independent on the samples. N and S can be calculated on different genomic regions (eg: coding region, transcript, exon, domain, etc.). The simplest input for this tool would be a given ORF sequence and you would think of more complete things as a GFF file.



It is a small thing, but unless anyone knows of an existing implementation, I think it may be useful to others. Do you think this would be a valuable contribution to biopython?







Best wishes,

Pablo Riesgo Ferreiro
Computational Medicine


TRON
Translationale Onkologie an der Universitätsmedizin der
Johannes Gutenberg-Universität Mainz gemeinnützige GmbH

_______________________________________________
Biopython mailing list  -  Biopython at biopython.org<mailto:Biopython at biopython.org>
https://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20211006/d820a99e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2021-09-30 09-55-58.png
Type: image/png
Size: 100309 bytes
Desc: Screenshot from 2021-09-30 09-55-58.png
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20211006/d820a99e/attachment-0001.png>


More information about the Biopython mailing list