[BioRuby] Parsing line-based formats with Ragel

Artem Tarasov lomereiter at googlemail.com
Sat Jun 2 13:06:12 UTC 2012


Hi guys,

I've recently discovered absolutely cool thing called Ragel (
http://www.complang.org/ragel/). It is a finite state machine compiler, its
applications include parsing Cucumber features in Gherkin, parsing HTTP
requests in Mongrel, and implementing pack/unpack functions in Rubinius.

It can be used for creating parser for any regular language, that includes
nearly every line-based format. It generates code for C, C++, Objective C,
D(!), Java, and Go. The speed of generated code is incredible.

I wrote a few words more about it in my blog:
http://lomereiter.wordpress.com/2012/06/02/ragel-and-bioinformatics/

Basically, you write a formal grammar, define which snippets of code to
execute on state transitions, and everything just works. As for me, I'm
going to implement SAM parser with this tool.

It can also be useful for Marjan. I wrote a GFF3 grammar, but it might be
incorrect in some places. Here's a basic example of usage:
https://github.com/lomereiter/bioragel/blob/master/examples/d/gff3.rl



--
Artem



More information about the BioRuby mailing list