[BioRuby] EMBL parsing

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Sat May 5 06:57:28 UTC 2007


Hi,

On Thu, 3 May 2007 12:48:03 +0100
Anthony Underwood <email2ants at gmail.com> wrote:

> Hi Mitsiteru,
> 
> Any of the embl files downloaded from the ebi site have this problem.
> 
> for example http://www.ebi.ac.uk/cgi-bin/dbfetch? 
> db=embl&style=raw&id=CP000360
> 
> Ruby takes all of the cpu power :(

It seems it is caused by thousands of iterations of str1 += str2
because it creates a new string object every time.
A patch is attached. (Ruby 1.8.0 or newer version required)

--- lib/bio/db.rb       5 Apr 2007 23:35:39 -0000       0.37
+++ lib/bio/db.rb       5 May 2007 06:08:39 -0000
@@ -313,12 +313,12 @@

   # Returns the contents of the entry as a Hash.
   def entry2hash(entry)
-    hash = Hash.new('')
+    hash = Hash.new { |h, k| h[k] = '' }
     entry.each_line do |line|
       tag = tag_get(line)
       next if tag == 'XX'
       tag = 'R' if tag =~ /^R./        # Reference lines
-      hash[tag] += line
+      hash[tag].concat line
     end
     return hash
   end


Naohisa Goto
ng at bioruby.org



More information about the BioRuby mailing list