[BioRuby] Bio-gff3 plugin 0.8.6

Pjotr Prins pjotr.public14 at thebird.nl
Mon Jan 17 10:08:12 UTC 2011


Released bio-gff3 parser plugin 0.8.6 on rubygems, and can be used
from the command-line. E.g.

  gem install bio-gff3
  gff3-fetch --help

Introduced LRU cache, replaced the BioRuby GFF line parser and
added lazy parsing. All with significant speedups compared to the
original (No-cache, BioRuby parser, non-lazy).

The LRU version has limited RAM use for any sized data (730MB), and
currently runs 6 times slower than the full memory version.

  Digesting parser:

  Cache              real     user     sys  version     RAM
  ------------------------------------------------------------
  full,bioruby       12m41    12m28    0m09 (0.8.0)
  full,line          12m13    12m06    0m07 (0.8.5)
  full,line,lazy     11m51    11m43    0m07 (0.8.6)     6,600M

  none,bioruby      504m     477m     26m50 (0.8.0)
  none,line         297m     267m     28m36 (0.8.5)       
  none,line,lazy    132m     106m     26m01 (0.8.6)       650M

  lru,bioruby       533m     510m     22m47 (0.8.5)
  lru,line          353m     326m     26m44 (0.8.5)  1K
  lru,line          305m     281m     22m30 (0.8.5) 10K
  lru,line,lazy     182m     161m     21m10 (0.8.6) 10K
  lru,line,lazy      75m      75m      0m17 (0.8.6) 50K   730M
  ------------------------------------------------------------

where

   52M  m_hapla.WS217.dna.fa
  456M  m_hapla.WS217.gff3

ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-linux]
on 64-bits CPU 2.6 GHz (6MB cache), 16 GB RAM machine. 

Note bio-gff3 0.8.6 is a fully digesting parser, with scope for full
validation of the GFF3 relations. The next step, a limited
'optimistic' digestion, will speed things up.

Note also that bio-gff3 exploits the bio-logger plugin - it is a good 
example.

Pj.



More information about the BioRuby mailing list