[BioPython] Zerg BLAST parser - timings
Leighton Pritchard
L.Pritchard at scri.sari.ac.uk
Mon Sep 1 11:37:57 EDT 2003
Hi,
While I don't imagine Zerg will necessarily be the best option for all
circumstances, running time trials with the script below (extracts all
query sequence names from NCBI standalone BLAST output) gives on our Linux
server:
361kb output file, 3 runs:
-------------------------------------------
Module Mean StDev
NCBIStandalone.Parser 0.986s 0.003s
Zerg 0.032s 0.005s
367Mb output file, 3 runs:
-------------------------------------------
Module Mean StDev
NCBIStandalone.Parser 1461.08s 20.95s
Zerg 58.29s 4.38s
There are almost certainly faster ways to use the Biopython parser code
(specialist consumer, perhaps?) than in the test script given below, but I
couldn't think of them nearly as quickly as I could implement the quick
run-though with the Zerg parser.
In this simple case, the Zerg code turned out to be between 25 and 30 times
faster than the Biopython code in the script below. The Zerg wrapper and
build instructions can be obtained from
http://bioinf.scri.sari.ac.uk/lp/pyzerg.shtml
############
import time
testfile = './testBLAST2.out'
print "Begin Biopython test"
# BIOPYTHON
from Bio.Blast import NCBIStandalone
fhandle = open(testfile, 'r')
parser = NCBIStandalone.BlastParser()
iterator = NCBIStandalone.Iterator(fhandle, parser)
biotime0 = time.time()
bioquerylist = []
while 1:
record = iterator.next()
if record is None:
break
bioquerylist.append(record.query)
biotime = time.time() - biotime0
print "End Biopython test"
print "Begin Zerg test"
# ZERG
import zerg
zergtime0 = time.time()
zergquerylist = []
zerg.open_file(testfile)
code, value = zerg.get_token()
while code:
if code == 2:
zergquerylist.append(value)
code, value = zerg.get_token()
zergtime = time.time() - zergtime0
print "End Zerg test"
print "Bio: %s; Zerg: %s" % (len(bioquerylist), len(zergquerylist))
print "Bio: %s; Zerg: %s" % (biotime, zergtime)
######
Dr Leighton Pritchard AMRSC
PPI, Scottish Crop Research Institute
Invergowrie, Dundee, DD2 5DA, Scotland, UK
L.Pritchard at scri.sari.ac.uk
PGP key 47B4A485: http://www.keyserver.net http://pgp.mit.edu
More information about the BioPython
mailing list