[Bioperl-l] (no subject)

Wed Sep 10 10:35:39 EDT 2003

I have been scripting primer design for a while where I find I have 
better control over
the heuristics and (importantly) can include BLAST/exonerate matching 
of a region
to its own genome to find unique-in-genome areas.

I know Primer3 is out there, but in some cases, making sure you design 
a primer in
a non-duplicated region is more important than getting the right G/C 
content etc.

I'd like to propose the following modules:

   Bio::Primer::Feature.pm

      a single primer, SeqFeatureI compliant, start/end on a sequence, 
reuses the seq(), has gc content
methods and has_inversion($size) which gives back the first inverted 
string over size or undef if none.

    Bio::Primer::Pair.pm

      a pair of primers, having left and right Bio::Primer::Feature.pm's 
with "joint" methods such as
     diff_gc(), the difference in GC content between the two pairs

    Bio::Primer::AssessmentI,pm

      interfaces which defines the method

         $score = $assor->assess($pair);

   Bio::Primer::Design.pm

        takes a sequence, an optional left hand region (defaults to 
50bp), an optional right hand region (defaults to 50bp), an optional 
primer size (default of 20), an optional prune score and a list of 
Bio::Primer::AssessmentI.pm  compliant modules.

        design works the following way:

          generates every left hand and right hand primer of size

          foreach left,right pair, applies each Assessment module in 
turn. If the score falls below
         prune at any point, discards this pair immedaitely

          (therefore by setting prune to - say - -100 and having an 
assessment module of inversion_greater_than_5 give -200 then primer 
pairs with this are never considered, to keep the
list manageable if needs be).

          stores final score for this pair

       provides final "best pair" or complete list

Assessment modules first up would be:

     Bio::Primer::Assessment::inversion_length.pm

     Bio::Primer::Assessment::GC_content.pm

     Bio::Primer::Assessment::GC_matching.pm (primers should have the 
same melting temperature)

     Bio::Primer::Assessment::product_length.pm (ideal product length of 
around 1KB)

these would all take some "weight" constructor to allow them to be 
weighted differently

I'd also build in Bio::SearchIO or SeqFeature based modules which 
"banned" certain regions of the
sequence from being used.

I thought about putting the Bio::Primer::Feature.pm in 
Bio::SeqFeature::Primer.pm but I thought that
keeping all the modules together made more sense.

This could also go off

Bio::Tools::Primer::*

if people prefered.

any views?