PEG can be thought of as an advance over regex. It can match more languages (for example balanced brackets) and can be paired with semantic actions to produce structured results from a parse.

The PEG language is implemented as a system of macros that compiles parser descriptions (rules) into scheme code. It is also provided with a custom syntax via "#lang peg".

The generated code parses text by interacting with the "PEG VM", which is a set of registers holding the input text, input position, control stack for backtracking and error reporting notes.

2Syntax Reference🔗 i

2.1define-peg🔗 i

syntax
(define-pegnamerule)

<rule> = (epsilon);always succeeds
| (charc);matches the character c
| (any-char);matches any character
| (rangec1c2);match any char between c1 and c2
| (stringstr);matches string str
| (and<rule>...);sequence
| (or<rule>...);prioritized choice
| (*<rule>...);zero or more
| (+<rule>...);one or more
| (?<rule>...);zero or one
| (callname)
| (capturename<rule>)
| (!<rule>...);negative lookahead
| (&<rule>);positive lookahead
| (drop<rule>...);discard the semantic result on matching

Defines a new scheme function named peg-rule:name by compiling the peg rule into scheme code that interacts with the PEG VM.

syntax
(define-pegnameruleaction)

Same as above, but also performs a semantic action to produce its result. Semantic actions are regular scheme expressions, they can refer to variables named by a capture.

We also provide shorthands for some common semantic actions:

syntax
(define-peg/dropnamerule)

= (define-pegrule-name(droprule))

makes the parser produce no result.

syntax
(define-peg/bakenamerule)

= (define-pegrule-name(nameresrule)res)

transforms the peg-result into a scheme object.

syntax
(define-peg/tagnamerule)

= (define-pegrule-name(nameresexp )` (rule-name., res))

tags the result with the peg rule name. Useful for parsers that create an AST.

2.2peg🔗 i

syntax
(pegruleinput-text)

Run a PEG parser. Attempt to parse the input-text string using the given rule. This is sets up the PEG VM registers into an initial state and then calls into the parser for rule.

3Examples🔗 i

3.1Example 1🔗 i

For a simple example, lets try splitting a sentence into words. We can describe a word as one or more of non-space characters, optionally followed by a space:

> (require peg)
> (define sentence"thequickbrownfoxjumpsoverthelazydog")
> (define-pegnon-space
(and (!#\space)(any-char)))
> (define-peg/bakeword
(and (+ non-space)
(drop(?#\space))))
> (pegwordsentence)
"the"
> (peg(+ word)sentence)
' ("the""quick""brown""fox""jumps""over""the""lazy""dog")

Using the peg lang, the example above is equal to

#lang peg

(define sentence "the quick brown fox jumps over the lazy dog"); //yes, we can use

//one-line comments and any sequence of s-exps BEFORE the grammar definition

non-space <- (! ' ') . ; //the dot is "any-char" in peg

word <- c:(non-space+ ~(' ' ?)) -> c ; //the ~ is drop

//we can use ident:peg to act as (name ident peg).

//and rule <- exp -> action is equal to (define-peg rule exp action)

//with this in a file, we can use the repl of drracket to do exactly the

//uses of peg above.

3.2Example 2🔗 i

Here is a simple calculator example that demonstrates semantic actions and recursive rules.

(define-pegnumber(nameres(+ (range#0円#9円)))
(string->number res))
(define-pegsum
(and (namev1prod)(?(and #\+(namev2sum))))
(if v2(+ v1v2)v1))
(define-pegprod
(and (namev1number)(?(and #\*(namev2prod))))
(if v2(* v1v2)v1))

this grammar in peg lang is equivalent to:

#lang peg
number<-res:[0-9]+->(string->numberres);
sum<-v1:prod('+'v2:sum)?->(ifv2(+v1v2)v1);
prod<-v1:number('*'v2:prod)?->(ifv2(*v1v2)v1);

Usage:

> (pegsum"2+3*4")
14
> (pegsum"2*3+4")
10
> (pegsum"7*2+3*4")
26

3.3Example 3🔗 i

Here is an example of parsing balanced parenthesis. It demonstrates a common technique of using _ for skipping whitespace, and using "define-peg/bake" to produce a list rather than a sequence from a * .

#lang racket
(require peg)

(define-peg/drop_ (* (or #\space#\newline)))

(define-pegsymbol
(and (nameres(+ (and (!#\(#\)#\space#\newline)(any-char))))_ )
(string->symbol res))

(define-peg/bakesexp
(or symbol
(and (drop#\()(* sexp)(drop#\)_ ))))

or in PEG syntax:

#lang peg
_<[\n]*;
symbol<-res:(![()\n].)+_->(string->symbolres);
sexp<-s:symbol/~'('s:sexp*~')'_->s;
//hadtouses:->sbecausethereisnowaytoexpressbakefromthePEGlanguage

Usage:

> (pegsexp"(foob(arbaz)quux)")
' (foob(arbaz)quux)
> (pegsexp"((())(()(())))")
' ((())(()(())))
> (pegsexp"(lambda(x)(listx(list(quotequote)x)))")
' (lambda (x)(list x(list ' quote x)))

4PEG Syntax🔗 i

This package also provides a "#lang peg" alternative, to allow you to make parsers in a more standard PEG syntax.

4.1PEG Syntax Reference🔗 i

The best way to understand the PEG syntax would be by reference to examples, there are many simple examples in the racket peg repo and the follow is the actual grammar used by racket-peg to implemet the peg lang:

Note: When you match the empty string in peg lang, the result object is the empty sequence, not the empty string, be careful.

#lang peg

(require "s-exp.rkt");

_ < ([ \t\n] / '//' [^\n]*)*;

SLASH < '/' _;

name <-- [a-zA-Z_] [a-zA-Z0-9\-_.]* _;

rule <-- name ('<--' / '<-' / '<') _ pattern ('->' _ s-exp _)? ';' _;

pattern <-- alternative (SLASH alternative)*;

alternative <-- expression+;

expression <-- (name ~':' _)? ([!&~] _)? primary ([*+?] _)?;

primary <-- '(' _ pattern ')' _ / '.' _ / literal / charclass / name;

literal <-- ~['] (~[\\] ['\\] / !['\\] .)* ~['] _;

charclass <-- ~'[' '^'? (cc-range / cc-escape / cc-single)+ ~']' _;

cc-range <-- cc-char ~'-' cc-char;

cc-escape <-- ~[\\] .;

cc-single <-- cc-char;

cc-char <- !cc-escape-char . / 'n' / 't';

cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't';

peg <-- _ import* rule+;

import <-- s-exp _ ';' _;

top ← prev up next →