[Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing

Fri Jan 30 11:29:07 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2738

------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:29 EST -------
I've run my test script (attachment 1209) on a Linux machine with Python 2.5

 5.5K Jan 30 10:29 CY029873.gbk
  67M Jan 22 17:53 dr_ref_chr16.gbk
  42M Jan 22 17:53 NC_003075.gbk
  14M Jan 22 18:43 NC_003272.gbk
  25M Jan 22 17:52 NC_003279.gbk
 4.8M Jan 22 18:44 NC_004350.gbk
  20M Jan 22 18:42 NC_008095.gbk
  14M Jan 22 18:44 NC_009925.gbk
  18M Jan 22 18:43 NC_010628.gbk
 296M Jan 22 17:52 ptr_ref_chr1.gbk
  86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk
 297M Jan 30 10:55 wgs.AABR.10.gbff.gbk

The last two files are WGS data for protein and nucleotide sequences,
downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk
extension added so my script parses them.

With and without the patch the test script gives identical output - which
appears to confirm the location parsing is not functionally altered.  The
timings where just over 2min and just over 8min with and without the patch (a
four fold speed up on this dataset).

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.