This guide demonstrates use of syntax-spec via the case study of constructing a DSL for structuring code as state machines.
We will:
Define the syntax (Grammar)
Add binding rules (Binding)
Integrate Racket subexpressions (Integrating Racket Subexpressions)
Compile to Racket code (Compilation)
Allow macros to extend the language (Macros)
Here’s what using the DSL to define a controller for a vending machine looks like:
(machine#:initialidle(stateidle(on(dollar)(gotopaid))(on(select-itemitem)(gotoidle)))(statepaid(on(select-itemitem)(gotoidle)))))
The vending machine has two states: idle and paid. It reacts to two kinds of external events: a dollar being inserted, and an item being selected for purchase.
The machine declaration acts as a class. Racket code interacts with the machine by constructing an instance and calling methods corresponding to machine transitions such as dollar. The get-state method returns a symbol representing the current state.
Within the machine, Racket code can be run when certain states are entered or certain transitions occur. Within transitions, these actions can reference arguments to the transition event, such as item in the select-item event.
The essential parts of a DSL implementation in syntax-spec are a specification of the DSL’s syntax and a compiler that transforms DSL syntax to Racket. The syntax is specified in terms of nonterminals with associated binding rules. We’ll introduce binding rules later in the tutorial. Host interface macros tie together the specification and the DSL compiler producing a Racket macro that forms the entry point to the language implementation.
Our initial specification with syntax-spec supplies the grammar:
(host-interface/expression(nonterminalstate-spec(nonterminaltransition-specaction:action-spec(nonterminalaction-spec
The syntax-spec form is the entry-point into the metalanguage. It must be used at the top-level of a module. In the example above, the language definition contains two kinds of definitions, for host interface macros and nonterminals.
The host-interface/expression form is used to define host interface macros that extend the language of Racket expressions. Here, it defines the machine syntax for creating a state machine implemented as a Racket class.
The first part of the host interface definition specifies the syntax of the host interface macro, beginning with the name of the form: machine. The remainder of the machine form’s syntax specification describes the literal elements of the syntax and its subexpression positions. Literal elements include keywords like #:initial. A colon-separated name like s:state-spec indicates a subexpression position, where the first portion is the spec variable used to name the position and the latter portion is a reference to a nonterminal or binding class indicating the type of syntax that may appear in the subexpression.
The remainder of the host interface declaration is compile-time Racket code. Once the DSL syntax is checked and macro-expanded according to the syntax specification, this compile-time code is responsible for compiling from the DSL to Racket. For now it’s a stub.
Consider this program:
(machine#:initialred(statered(on(eventx)(gotogreen))(on(eventx)(gotored))))
Our first transition is to green, but there is no green state. This should result in an unbound variable error.
However, let’s say our compiler translates (gotogreen) to (set! state'green) and doesn’t produce any identifiers for green. Would we get an unbound reference error for green? No! We’d just have strange behavior at runtime, or maybe a runtime error, depending on the compiler.
We could adjust our compiler to check for unbound state references, but syntax-spec can do it for us. syntax-spec allows us to declare the binding and scoping rules for our language, and bindings and references will be checked before your compiler is even invoked, so your compiler can assume the program is not only grammatically correct, but also well-bound.
There are also several other benefits that we get by providing binding rules. We can use symbol tables to associate information with identifiers, we can allow our languages to have hygienic macros, we can compute the free identifiers of an expression, and many other identifier-related operations. We’ll get more into these details later, but the point is you get a lot for free by declaring binding rules. This is why you should be excited!
First, let’s declare that the arguments to an action are in scope in the guard expression:
(binding-classevent-var)(nonterminaltransition-specaction:action-spec#:binding(scope(bindarg)body)(nonterminalaction-spec
We added a binding class, event-var, for an event’s argument names. We also added a #:binding declaration to transition actions to declare that the args are bound in the action expressions and this binding introduces a new scope.
These simple binding rules behave like let :
(binding-classmy-var)(nonterminalmy-expr#:binding[e(scope(bindx)body)]x:my-varn:number))
We could’ve just written (scope(bindx)body). syntax-spec will automatically treat e as a reference position outside of the new scope. That’s why we don’t have to mention event-name in the binding rules for transitions. Additionally, for action-spec expressions, there is an implicit #:binding rule generated that treats x as a reference position.
Now let’s add binding rules for state names. We can’t just use scope and bind since the binding of the state name comes from the state-spec nonterminal, and those bindings need to be in scope throughout the entire machine form. To use bind, we need to be able to refer to the name being bound directly. For this kind of binding structure, we use export to export bindings from the state-spec nonterminal and import to import those bindings into a scope in the machine host interface:
(binding-classstate-name)
(host-interface/expression(nonterminal/exportingstate-spec(nonterminaltransition-specaction:action-spec
We use an exporting nonterminal for state-spec, which allows us to use the export binding rule. This binds name in transition and the other state-spec forms in the body of the machine, like define in a class body or a block form.
Similar to bind for a variable, we use import to declare that an exporting nonterminal’s bindings should be in scope for the initial-state in the machine.
There is another type of binding rule that doesn’t fit into our state machine language, but you might need it when creating a different language. This is nested binding and behaves like let* , where you have a sequence of variables being defined and each one is in scope for the subsequent definitions (but not previous ones). Here is an example:
(binding-classmy-var)(nonterminalmy-expr#:binding(nestbbody)n:numberx:my-var)(nonterminal/nestingbinding-pair(nested)[x:my-vare:my-expr]#:binding(scope(bindx)nested)))
We create a nesting nonterminal for a binding pair, which has nested, which is like an argument for the nonterminal’s binding rules. This represents the scope tree of the rest of the binding rules. In this case, the scope tree gets built up sort of like foldr on a list.
The scope tree is a first-class representation of the binding structure of the program. It’s not something that you explicitly work with, but it’s useful to know about. Conceptually, syntax-spec uses your language’s binding rules to construct this scope tree during expansion.
From the simple nonterminal my-expr, we put the binding-pair’s bindings in scope using nest, providing body as the intial value of nested, like the base case value of foldr .
In our state machine language, action expressions are very limited. Let’s remind ourselves what the grammar for an action expression looks like:
(nonterminalaction-spec
An action expression can only displayln the value of a variable. What if we want something fancier, like using format inside the displayln ? Really, it’d be ideal to be able to allow arbitrary racket expressions for the action. We can actually do that!
(nonterminal/exportingstate-spec(nonterminaltransition-specbody:racket-expr#:binding(scope(bindarg)body))... )
Instead of using action-spec and defining our own nonterminal for action expressions, we can just use racket-expr , which allows arbitrary racket expressions. And our event-var identifiers will be in scope in the racket expression! We can control how references to our DSL-bound variables behave in Racket expressions and whether they’re allowed at all using reference compilers, which we’ll discuss in the Compilation section.
In addition to racket-expr , syntax-spec provides racket-var for allowing references to Racket-defined variables in DSL expressions, and racket-macro for allowing the language to be extended by arbitrary Racket macros. We’ll talk more about macros in the Macros section.
Now that we have our grammar and binding rules defined, we must write a compiler to translate a state machine program to Racket. We already have a host interface macro defined, which is the entry point to our DSL:
(host-interface/expression... )
However, our compiler, which performs the actual translation, is not defined. The compiler is a macro that translates our state machine language to Racket code. In our compiler, we’ll translate the state machine to Racket classes using the state machine pattern.
For example, let’s imagine how we’d translate the example state machine:
(machine#:initialidle(stateidle(on(dollar)(gotopaid))(on(select-itemitem)(gotoidle)))(statepaid(on(select-itemitem)(gotoidle)))))
We’ll create a class for the state machine, which acts as a context class, and a class for each state:
'idle)'idle)
The machine% class stores the current state instance and delegates to it. Each state class has methods for each defined transition. Transition actions go in the transition’s method and on-enter actions go in the class body. When a state is entered, the machine% class creates a fresh instance of it, which runs the class body, and sets the current state to that instance. Finally, we return an instance of the machine class.
Now Let’s start to write the compiler:
(host-interface/expression... )#:datum-literals(machinestateon-enter)(statestate-name... )'state-name)action...
We defined a macro, compile-machine, which expands to something similar to what we wrote by hand above. One thing we have to do with syntax-spec is wrap the generated code in a with-reference-compilers form. This allows us to control whether and how DSL identifiers behave in Racket expressions like actions. In our case, we use mutable-reference-compiler , which allows event arguments to be referenced and mutated. We don’t specify a reference compiler for state names, so they cannot be referenced in Racket expressions. Only goto.
We have helpers to define the proxy methods in the machine% class and transition methods in the state classes:
#:datum-literals(ongoto)#:datum-literals(ongoto)action...(gotoname))machine)action...
For compile-proxy-methods, to generate one method definition for each possible transition, we gather up all the transitions in compile-machine with that (e... ... ), remove the duplicate transition event names, and define a proxy method for each one that delegates to the state instance, which is passed in as target. #racket[compile-event-method] is pretty straightforward.
One thing to note is that Racket expressions like action in compile-event-method get wrapped in a #%host-expression form by syntax-spec. You can usually ignore this fact completely when writing a compiler, but if you try to inspect the contents of a Racket expression in a compiler, you’ll have to account for it.
Now we have all the pieces to run programs using state machines:
syntax-spec-v2/tests/dsls/state-machine-for-tutorial)(machine#:initialidle(stateidle(on(dollar)(gotopaid))(gotoidle)))(statepaid(on(select-itemitem)(gotoidle)))))pay a dollar
'idle
you need to pay before selecting an item
pay a dollar
'idle
select an item
'paid
dispensing chips
pay a dollar
'idle
Symbol tables and symbol sets allow us to associate information with identifiers, similar to Dictionaries with Identifier Keys and Sets with Identifier Keys, but for DSL identifiers.
In our language’s compiler, we can use symbol set to raise an error when a state is unreachable:
(host-interface/expression... )(define-local-symbol-setaccessible-states)state-specs))(add-reachable-states!next-state-name))))(add-reachable-states!initial-state-id)accessible-states)[(statenamebody(gotonext-state-name)))... )
We build up a symbol set of accessible states with a depth-first search over the possible transitions starting from the initial state, and if we find a state that isn’t accessible, we error.
This static check runs before we generate the compiled code. Compilers may have many static analysis passes like this one, or even passes that emit an intermediate representation like ANF. There are some special considerations to be made when creating multi-pass compilers with intermediate representations in syntax-spec which are covered in Advanced Tutorial: A Compiler with Multiple Passes
syntax-spec-v2/tests/dsls/state-machine-for-tutorial)machine: Inaccessible state: empty
We forgot to add a transition to go from full to empty. And since we start on full, there is no way to get to empty.
syntax-spec allows us to make our DSLs macro-extensible. For example, let’s allow users to create macros for definining states:
(extension-classstate-macro)(nonterminal/exportingstate-spec#:allow-extensionstate-macro(define-state-syntaxnametrans)
By adding an extension class called state-macro and allowing state-spec to be extended by these state macros, transformers wrapped with state-macro can be used in state-spec positions. syntax-spec provides define-dsl-syntax for defining these wrapped transformers. These macros will be hygienic in our DSL. Since only certain nonterminals are extensible by certain extension classes, we can control what kinds of macros can be used where.
Now let’s create a macro in our language!
syntax-spec-v2/tests/dsls/state-machine-for-tutorial)> (define-state-syntaxsimple-state(statename(on(evt)(gotonext))(machine#:initialred(simple-statered[tickgreen])(simple-stategreen[tickyellow])(simple-stateyellow[tickred])))'red
'green
'yellow
'red
The full code for the state machine example is available at https://github.com/michaelballantyne/syntax-spec/blob/main/tests/dsls/state-machine-for-tutorial.rkt.
There is also an example of using the state machine language to create a CSV browser with a GUI at https://github.com/michaelballantyne/syntax-spec/blob/main/demos/minimal-state-machine/csv-browser.rkt