[Bioperl-l] primer candidates validation by comparing the wgs blast results between fwd and rev.

Thu Feb 18 20:05:21 UTC 2010

Here's a few thoughts:

* Design your primers with primer3 - automate the process with Bio:Tools:Run:Primer3. 
* Install the required databases locally and use Bio::Tools::Run::StandAloneBlast or Bio::Tools::Run::StandAloneBlastPlus as it will be much faster than remote blast and gives you greater control.
* You will need to be very careful with your blast parameters as trying to map primers reliably to a genome can be a bit tricky!! Perhaps reverse ePCR would be a better choice http://www.ncbi.nlm.nih.gov/projects/e-pcr/ 
* Have you had a look at primer-blast? http://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi? You may be able to send requests by POST but I don't think BioPerl has methods for calling it (yet)

Hope this helps,

Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of teetee
> Sent: Thursday, 18 February 2010 3:11 p.m.
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] primer candidates validation by comparing the wgs
> blast results between fwd and rev.
> 
> 
> I am totally new to bioperl.
> 
> I would like to see if anyone could give me a hint or clue for tackling
> this
> problem I am trying to solve:
> Use bioperl/perl script and CGI to create a primer quality control web
> interface
> 
> the steps I would like to be automated:
> I design many primer pairs (~500+) flanking intron regions of silkworm wgs
> sequences close to cDNA/mRNA/EST/molecular anchor loci selected. After the
> primers are generated (I wish this step could be automated but it really
> can't), I have to blast each and every of them against the wgs database of
> the same organism to make sure there is no common hits in terms of the
> same
> contig number result between the forward and the reverse primers blastn
> hits
> to avoid the non-target amplification except for the target intron region.
> The steps I take to validate the primers are as follows:
> 1. At NCBI blastn webpage, put in the forward primer sequence in the
> search
> field, label it ("job title"), choose the wgs database and the organism,
> and
> click "submit" to start search.
> 2. Open another browser tab and go to NCBI blastn webpage, put in the
> reverse primer sequence in the search field, label it ("job title"),
> choose
> the wgs database and the organism, and click "submit" to start search.
> 3. On the forward primer blastn result page, write down the top 20 wgs
> sequence that was from build 2 genomic sequencing project (the title of
> each
> hit has a text string with certain format like
> "Bm_scaf<number>_contig<number>").
> 4. On the reverse primer blastn result page, write down the top 20 wgs
> sequence that was from build 2 genomic sequencing project (the title of
> each
> hit has a text string with certain format like
> "Bm_scaf<number>_contig<number>").
> 5. compare the recorded blast hits from step 3 and step 4 and list the
> common hit(s) between the two primer sequences (with the same scaffold and
> contig number)
> 6. show a warning if there is more than one common hit since there should
> be
> only one target hit.
> 
> Example:
> I have these two primer sequences:
> GCATCGGTGAACGAGCTA
> CGCCTGCAAACGAGAATA
> 
> First I blast each of the above primer sequences against wgs database
> bombyx
> mori (organismid:7091) on blast website
> http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=mega
> Blast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome
> After I get the results in two different browser tabs, I record down the
> results. For example, from the forward primer result page(only the first
> several hits are listed):
> =================== first few hits from forward primer blast result
> ==================
> BABH01015134.1
> Bombyx mori DNA, contig: Bm_scaf21_contig15134,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
> 
> BABH01038273.1
> Bombyx mori DNA, contig: Bm_scaf121_contig38273,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
> 
> AADK01021213.1
> Bombyx mori strain Dazao Ctg021213, whole genome
> shotgun sequence
> 34.2 34.2 94% 0.12 100%
> 
> BAAB01106839.1
> Bombyx mori DNA, contig477862, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
> 
> BAAB01154920.1
> Bombyx mori DNA, contig585939, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
> 
> BABH01007204.1
> Bombyx mori DNA, contig: Bm_scaf8_contig7204,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 32.2 32.2 88% 0.48 100%
> 
> BABH01020379.1
> Bombyx mori DNA, contig: Bm_scaf33_contig20379,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 32.2 32.2 88% 0.48 100%
> ================================ end of the forward blast result
> ================================ first few hits from reverse primer blast
> result ==================
> BABH01015134.1
> Bombyx mori DNA, contig:
> Bm_scaf21_contig15134, strain: p50T/Dazao,
> build 2, whole genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
> 
> AADK01021213.1
> Bombyx mori strain Dazao Ctg021213, whole
> genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
> 
> AADK01032592.1
> Bombyx mori strain Dazao Ctg032592, whole
> genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
> 
> BAAB01106839.1
> Bombyx mori DNA, contig477862, whole genome
> shotgun sequence
> 36.2 36.2 100% 0.031 100%
> 
> BABH01028024.1
> Bombyx mori DNA, contig:
> Bm_scaf56_contig28024, strain: p50T/Dazao,
> build 2, whole genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
> 
> AADK01039561.1
> Bombyx mori strain Dazao Ctg039561, whole
> genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
> 
> AADK01056852.1
> Bombyx mori strain Dazao Ctg063892, whole
> genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
> 
> BABH01001710.1
> Bombyx mori DNA, contig: Bm_scaf2_contig1710,
> strain: p50T/Dazao, build 2, whole genome
> shotgun sequence
> ===============================end of the reverse result
> 
> >From the list, I would record down the ones with "Bm_scaf#_contig#"(ex.
> Bm_scaf21_contig15134) since that's the string pattern I would like to
> compare with the hits from reverse primer blast results.
> After I record down the first 20 qualified blast hits(hopefully with blast
> paser program I can use more than 20), I compare them with the ones from
> the
> reverse primer search result and see if there is any common result other
> than the target.
> 
> I am OK to go through this validation process manually if there are only
> tens of primers I have to design. However with 500 and maybe more primers
> to
> come I believe there is an easier way.
> 
> 
> I imagine the code will have the following functions:
> input: user's primer pairs, multiple entry capatability
> output: compare the blastn results between both fwd and reverse primer and
> generate a list of common blastn hits (w/ same scaf# and contig# from
> build
> 2 wgs sequences - naming convention: Bm_scaf#_contig#) on the wgs (accept
> the blast search parameters through the web and pass to the blast command)
> background record-keeping mechanism: create records of the blast report
> for
> each primer vs wgs blastn results and properly name the files.
> 
> I guess my question is:
> What would be the most stright-forward approach? (I know you probably
> think
> I already know the method since I post the question here, but more
> suggestions the better) and where should I start?
> 
> My background:
> 1. I've written codes for retrieving PDB file upon user's PDB 4-letter
> protein ID entry and atom-atom distance measurement with the same
> setup(perl
> script+web CGI interface)
> 2. I've used command-line megablast to batch blast multiple sequences
> however my impression is that it's not intend to do short sequence
> blast(primers are usually around 20-24bp long).
> ref.:
> http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TY
> PE=Download
> 3. I can modify simple perl codes and do some text string menupilation in
> perl
> 4. I have my own linux box and have apache/perl/bioperl/cgi ready.
> 
> --
> View this message in context: http://old.nabble.com/primer-candidates-
> validation-by-comparing-the-wgs-blast-results-between-fwd-and-rev.-
> tp27633496p27633496.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================