[Biopython] NcbiblastnCommandline vs subprocess blast

Tanya Golubchik golubchi at stats.ox.ac.uk
Wed Aug 21 08:41:48 UTC 2013


Hello,

The following refers to Biopython 1.61. Does anyone know if there are 
any hidden or hard-coded defaults for any parameters in 
NcbiblastnCommandline? Or any known bugs that could cause hits not to be 
reported?

I've encountered an extremely frustrating issue that I've never seen 
before. The upshot is that the blastn result obtained through 
NcbiblastnCommandline occasionally reports "no hits" in a way that seems 
dependent on the query file. This strange output is different from that 
obtained via a subprocess call (or outside python entirely) -- both of 
which recover all hits consistently. I am using *exactly* the same 
parameters, inputs, and exactly the same path to the blastn executable.

What actually happens is that for my large fasta file of 3000-odd 
nucleotide queries, several give no hits with the NcbiblastnCommandline 
call (these are not at the end of the query file, just throughout the 
file). On the other hand, cutting the file down to about 2000 queries, 
but not changing it in any other way, does give hits for these missing 
queries. Note that *nothing* else is changed; the parameters and call to 
blastn remain identical. This only happens for some, not all, of the 
blast databases I'm searching, making it look like there are variable 
deletions between the samples.

I can provide (large) test data files if anyone thinks they can help. I 
have the query file that produces the wrong 'patchy' output, another 
that produces the correct output, and a sample blast database for which 
this happens.

The actual calls are (paths substituted):

NcbiblastnCommandline(cmd='/path/to/blastn', outfmt=5, 
query='untrimmed.fasta', db='/path/to/db/C00006635', gapopen=5, 
gapextend=2, culling_limit=2)

The above gives 'no hits' for about 25 queries out of the 3000+ in the file.

stdout, stderr = sp.Popen("/path/to/blastn -db /path/to/db/C00006635 
-query untrimmed.fasta -outfmt 5 -word_size 17 -gapopen 5 -gapextend 2 
-culling_limit 2".split(), stdout=sp.PIPE, stderr=sp.PIPE).communicate()

The above call to subprocess returns all hits, correctly.

Thanks
Tanya



More information about the Biopython mailing list