[BioRuby] Parsing line-based formats with Ragel

Fields, Christopher J cjfields at illinois.edu
Mon Jun 4 12:56:39 UTC 2012


On Jun 4, 2012, at 12:17 AM, Pjotr Prins wrote:

> On Mon, Jun 04, 2012 at 12:56:18AM +0000, Fields, Christopher J wrote:
>> Have to agree, and in cases where a Bio* might run into problems
>> with Ragel (Perl or Python) we can at least look at the grammar and
>> use something for those languages that is similar in concept (e.g.
>> Marpa for Perl), or go a little more roundabout and bind to
>> C-generated ones from Ragel.
> 
> Also agree. Parsing is a common theme in Bio*. A state engine would
> be a great abstraction, targetting C or D, and even the interpreted
> languages. The SAM parser would be a great proof-of-concept. I am
> also very interested to see how it will perform against samtools.
> 
> The spanner in the works may be that we tend to be very sloppy about
> standards. So relaxed parsers may also be needed.

Either that, or use the grammar as a source of validation (e.g. if the parse fails, the data is not formatted correctly).  That's basicallt the tact I plan with perl 6 grammars.

chris

> Pj.






More information about the BioRuby mailing list