[Bioperl-l] Degenerate primer calculation
skalla at rice.edu
Mon Jan 12 20:09:47 UTC 2009
Samantha Thompson wrote:
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: 08 December 2008 16:41
> To: Samantha Thompson
> Cc: bioperl-l List
> Subject: Re: [Bioperl-l] Degenerate primer calculation
> On Dec 8, 2008, at 9:59 AM, Samantha Thompson wrote:
>> I also have another similar sequence analysis/primer problem.
>> What I'd like to do is produce degenerate primers from amino acid
>> What I did initially was take the codon usage table and rewrite it
>> in a
>> hash in perl in the form of degenerate codon usage e.g Lysine/K
>> would be
>> AAR, its reverse complement would be YTT. So my form then takes an
>> acid sequence (derived as a consensus from multiple the alignment of
>> homologous proteins) and converts them into degenerate codons and then
>> that degenerate primer (actually several primers synthesised with
>> different bases pooled together), in order to search for homologues to
>> the protein in unsequenced organisms.
>> I would like to improve this by being able to take a consensus
>> more in the form of a Prosite motif (I think thats the right one) such
>> as [TS]YW[RKSD] and then develop a degenerate nucleotide sequence
>> corresponding to this.
>> So I'm wondering if bioperl contains anything like this (both prosite
>> motif format parsing and degenerate code from multiple alignments or
>> such a motif), or if I need to write this myself (which I want to if
>> doesn't exist already).
>> Thanks again,
> Bio::Tools::CodonTable reverse translates, but I don't think it
> accepts patterns. Maybe a pipeline including Bio::Tools::SeqPattern?
> Might be an interesting programming challenge if it isn't already set
> up for that.
> I'm trying to have a go at solving this problem and I'm looking at
> Bio::Tools::SeqPattern. What I would like to be able to obtain from a
> motif is a list of all the sequences that that sequence could correspond
> to. E.g IKL[GP]NM could be IKLGNM or IKLPNM ... so I take both of these
> sequences and turn them into degenerate codons for each amino acid. The
> complicated part (I thought) here is creating a degenerate codon that
> corresponds to either G or P. The way I will do this is by producing
> each of the 3 degenerate bases and creating a new codon by creating each
> of the 3 degenerate bases separately based on a 2D matrix which contains
> the result of 'crossing' each of the nucleotide bases of the degenerate
> code with each other. So when you cross the codon for G (GGN) with the
> codon for P (CCN) you get a codon that contains the degeneracy of both
> (SSN). So then you have a degenerate nucleotide sequence for your
> peptide motif.
> I have written this part already but I am wondering about the expand
> function of Bio::Tools::SeqPattern . I'm not quite sure what it means by
> the expanded sequence (if there is just one?) that it returns. I'm
> trying to get every possible permutation of the motif is there any
> function that does this or will I have to write one to parse it myself?
> This would be great, but what would make things even better would be if
> I could take multiple sequence alignments and produce patterns/motifs
> from them. Is there a part of BioPerl that does something like this?
Correct me if I'm wrong (or if it's not relevant)... If you use the
example above with G (GGN) and P (CCN) and combine to give SSN, wouldn't
you also get everything that had an A (GCN) or a R (CGN) at that residue?
More information about the Bioperl-l