<div dir="rtl"><div dir="ltr">Dear Biopython list users,</div><div dir="ltr"><p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:14px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px;background-image:initial;background-repeat:initial">
I'm using Biopython for the first time. I have sequence data from unknown organisms, and trying to use BLAST to tell which organism they are most likely to have come from. I wrote the following function to do that:</p>
<pre class="" style="margin-top:0px;margin-bottom:10px;padding:5px;border:0px;font-size:14px;vertical-align:baseline;font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,serif;overflow:auto;width:auto;max-height:600px;word-wrap:normal;color:rgb(0,0,0);line-height:17.804800033569336px;background:rgb(238,238,238)">
<code style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;font-family:Consolas,Menlo,Monaco,'Lucida Console','Liberation Mono','DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New',monospace,serif;white-space:inherit;background-image:initial;background-repeat:initial"><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(0,0,139);background:transparent">def</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> find_organism</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">file</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">):</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"""
Receives a fasta file with a single seq, and uses BLAST to find
from which organism it was taken.
"""</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># get seq from fasta file</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
seqRecord </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(43,145,175);background:transparent">SeqIO</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">read</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">file</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"fasta"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># run BLAST</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
blastResult </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> NCBIWWW</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">qblast</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"blastn"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"nt"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> seqRecord</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">seq</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># get first hit</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
blastRecord </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> NCBIXML</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">read</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">blastResult</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
firstHit </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> blastRecord</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">alignments</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">[</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">0</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">]</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># get hit's gi number</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
title </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> firstHit</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">title
gi </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> title</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">split</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"|"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)[</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">1</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">]</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># search NCBI for the gi number</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
ncbiResult </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(43,145,175);background:transparent">Entrez</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">efetch</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">db</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"nucleotide"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> id</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">gi</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> rettype</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"gb"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> retmode</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"text"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
ncbiResultSeqRec </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(43,145,175);background:transparent">SeqIO</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">read</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">ncbiResult</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">,</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">"gb"</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">)</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,128,128);background:transparent"># get organism</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">
annotatDict </span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">=</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent"> ncbiResultSeqRec</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">.</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">annotations
</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(0,0,139);background:transparent">return</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">(</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">annotatDict</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">[</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;color:rgb(128,0,0);background:transparent">'organism'</span><span class="" style="margin:0px;padding:0px;border:0px;font-size:14px;vertical-align:baseline;background:transparent">])</span></code></pre>
<p style="margin:0px 0px 1em;padding:0px;border:0px;font-size:14px;vertical-align:baseline;clear:both;color:rgb(0,0,0);font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:17.804800033569336px;background-image:initial;background-repeat:initial">
It works fine, but takes about 2 minutes to retrieve the organism for each species, which seems very slow to me. I'm just wondering if I could do better. I know that I may create a local copy of NCBI to improve performance, and I might do that. However, I suspect that querying BLAST first, then take the id and use it to query Entrez is not the way to go. Do you have any other suggestions for improvements?<br>
Thanks!</p></div></div>