[BioRuby] Parsing line-based formats with Ragel

Pjotr Prins pjotr.public14 at thebird.nl
Sat Jun 2 16:24:55 UTC 2012


On Sat, Jun 02, 2012 at 04:18:40PM +0200, Marjan Povolni wrote:
> Cool, definitely something worth checking out for GFF3.

One reason the state-machine is fast is because it does not create
objects in memory (avoiding so called death by object creation ;).
Data will be in the CPU cache, rather than main memory. Be interesting
to see if Artem can run parsers on multi-core.

With GFF3 line parsing, a really simple format, we immediately create
a range of objects. Of course, this can happen on the stack too, so
the speed advantage may not be that important.

Still, I think especially for escape characters and character encodings
this could be interesting for GFF3. Because that is the most
complicated to get right. 

For now, we choose to assume GFF3 is plain ASCII. So, I guess we file
this under 'enhancements'. Right?

Pj.



More information about the BioRuby mailing list