[Bioperl-l] Degenerate primer calculation

Mon Jan 12 17:45:53 UTC 2009

-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: 08 December 2008 16:41
To: Samantha Thompson
Cc: bioperl-l List
Subject: Re: [Bioperl-l] Degenerate primer calculation

On Dec 8, 2008, at 9:59 AM, Samantha Thompson wrote:

> Hi,
>
> I also have another similar sequence analysis/primer problem.
>
> What I'd like to do is produce degenerate primers from amino acid
> sequences.
>
> What I did initially was take the codon usage table and rewrite it  
> in a
> hash in perl in the form of degenerate codon usage e.g Lysine/K  
> would be
> AAR, its reverse complement would be YTT. So my form then takes an  
> amino
> acid sequence (derived as a consensus from multiple the alignment of
> homologous proteins) and converts them into degenerate codons and then
> that degenerate primer (actually several primers synthesised with
> different bases pooled together), in order to search for homologues to
> the protein in unsequenced organisms.
>
> I would like to improve this by being able to take a consensus  
> described
> more in the form of a Prosite motif (I think thats the right one) such
> as [TS]YW[RKSD] and then develop a degenerate nucleotide sequence
> corresponding to this.
>
> So I'm wondering if bioperl contains anything like this (both prosite
> motif format parsing and degenerate code from multiple alignments or
> such a motif), or if I need to write this myself (which I want to if  
> it
> doesn't exist already).
>
> Thanks again,
>
> Sam

Bio::Tools::CodonTable reverse translates, but I don't think it  
accepts patterns.  Maybe a pipeline including Bio::Tools::SeqPattern?   
Might be an interesting programming challenge if it isn't already set  
up for that.

Chris
...........
Hi,

I'm trying to have a go at solving this problem and I'm looking at
Bio::Tools::SeqPattern. What I would like to be able to obtain from a
motif is a list of all the sequences that that sequence could correspond
to. E.g IKL[GP]NM could be IKLGNM or IKLPNM ... so I take both of these
sequences and turn them into degenerate codons for each amino acid. The
complicated part (I thought) here is creating a degenerate codon that
corresponds to either G or P. The way I will do this is by producing
each of the 3 degenerate bases and creating a new codon by creating each
of the 3 degenerate bases separately based on a 2D matrix which contains
the result of 'crossing' each of the nucleotide bases of the degenerate
code with each other. So when you cross the codon for G (GGN) with the
codon for P (CCN) you get a codon that contains the degeneracy of both
(SSN). So then you have a degenerate nucleotide sequence for your
peptide motif.
I have written this part already but I am wondering about the expand
function of Bio::Tools::SeqPattern . I'm not quite sure what it means by
the expanded sequence (if there is just one?) that it returns. I'm
trying to get every possible permutation of the motif is there any
function that does this or will I have to write one to parse it myself?
.....
This would be great, but what would make things even better would be if
I could take multiple sequence alignments and produce patterns/motifs
from them. Is there a part of BioPerl that does something like this?

Thanks,

Sam