This library implements a PEG parser generator.
PEG can be thought of as an advance over regex. It can match more languages (for example balanced brackets) and can be paired with semantic actions to produce structured results from a parse.
The PEG language is implemented as a system of macros that compiles parser descriptions (rules) into scheme code. It is also provided with a custom syntax via "#lang peg".
The generated code parses text by interacting with the "PEG VM", which is a set of registers holding the input text, input position, control stack for backtracking and error reporting notes.
syntax
(define-pegnamerule)
<rule> = (epsilon);always succeeds| (charc);matches the character c| (any-char);matches any character| (rangec1c2);match any char between c1 and c2| (stringstr);matches string str| (and<rule>...);sequence| (or<rule>...);prioritized choice| (*<rule>...);zero or more| (+<rule>...);one or more| (?<rule>...);zero or one| (callname)| (capturename<rule>)| (!<rule>...);negative lookahead| (&<rule>);positive lookahead| (drop<rule>...);discard the semantic result on matching
syntax
(define-pegnameruleaction)
We also provide shorthands for some common semantic actions:
syntax
(define-peg/dropnamerule)
makes the parser produce no result.
syntax
(define-peg/bakenamerule)
transforms the peg-result into a scheme object.
syntax
(define-peg/tagnamerule)
tags the result with the peg rule name. Useful for parsers that create an AST.
syntax
(pegruleinput-text)
For a simple example, lets try splitting a sentence into words. We can describe a word as one or more of non-space characters, optionally followed by a space:
(drop(?#\space))))"the"
#lang peg
(define sentence "the quick brown fox jumps over the lazy dog"); //yes, we can use
//one-line comments and any sequence of s-exps BEFORE the grammar definition
non-space <- (! ' ') . ; //the dot is "any-char" in peg
word <- c:(non-space+ ~(' ' ?)) -> c ; //the ~ is drop
//we can use ident:peg to act as (name ident peg).
//and rule <- exp -> action is equal to (define-peg rule exp action)
//with this in a file, we can use the repl of drracket to do exactly the
//uses of peg above.
Here is a simple calculator example that demonstrates semantic actions and recursive rules.
(define-pegsum(define-pegprod
this grammar in peg lang is equivalent to:
#lang pegnumber<-res:[0-9]+->(string->numberres);sum<-v1:prod('+'v2:sum)?->(ifv2(+v1v2)v1);prod<-v1:number('*'v2:prod)?->(ifv2(*v1v2)v1);
Usage:
141026
Here is an example of parsing balanced parenthesis. It demonstrates a common technique of using _ for skipping whitespace, and using "define-peg/bake" to produce a list rather than a sequence from a * .
(define-pegsymbol(define-peg/bakesexp
or in PEG syntax:
#lang peg_<[\n]*;symbol<-res:(![()\n].)+_->(string->symbolres);sexp<-s:symbol/~'('s:sexp*~')'_->s;//hadtouses:->sbecausethereisnowaytoexpressbakefromthePEGlanguage
Usage:
This package also provides a "#lang peg" alternative, to allow you to make parsers in a more standard PEG syntax.
The best way to understand the PEG syntax would be by reference to examples, there are many simple examples in the racket peg repo and the follow is the actual grammar used by racket-peg to implemet the peg lang:
Note: When you match the empty string in peg lang, the result object is the empty sequence, not the empty string, be careful.
#lang peg
(require "s-exp.rkt");
_ < ([ \t\n] / '//' [^\n]*)*;
SLASH < '/' _;
name <-- [a-zA-Z_] [a-zA-Z0-9\-_.]* _;
rule <-- name ('<--' / '<-' / '<') _ pattern ('->' _ s-exp _)? ';' _;
pattern <-- alternative (SLASH alternative)*;
alternative <-- expression+;
expression <-- (name ~':' _)? ([!&~] _)? primary ([*+?] _)?;
primary <-- '(' _ pattern ')' _ / '.' _ / literal / charclass / name;
literal <-- ~['] (~[\\] ['\\] / !['\\] .)* ~['] _;
charclass <-- ~'[' '^'? (cc-range / cc-escape / cc-single)+ ~']' _;
cc-range <-- cc-char ~'-' cc-char;
cc-escape <-- ~[\\] .;
cc-single <-- cc-char;
cc-char <- !cc-escape-char . / 'n' / 't';
cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't';
peg <-- _ import* rule+;
import <-- s-exp _ ';' _;