This guide demonstrates advanced usage of syntax-spec via the case study of construscting a DSL for the simply typed lambda calculus.
Here is an example program in our language:
(f1))
Let’s start out with defining the grammar and binding rules for basic typed expressions:
(binding-classtyped-var)(extension-classtyped-macro#:binding-spacestlc)(nonterminaltyped-expr#:allow-extensiontyped-macro#:binding-spacestlcx:typed-varn:number#'(:et))(:e:typed-exprt:type)(nonterminaltypeNumber(host-interface/expression(stlc/expre:typed-expr)(infer-expr-type#'e)#'(compile-expre))(host-interface/expression(stlc/infere:typed-expr)#`'#,t-datum))
There are some features we’ve never seen here. Let’s go through them one by one:
(binding-classtyped-var)(extension-classtyped-macro#:binding-spacestlc)(nonterminaltyped-expr#:binding-spacestlc... )
Syntax-spec supports binding spaces, which allow DSL forms to have the same names as regular Racket forms like let without shadowing them. Even DSL macros won’t shadow Racket names. We will eventually write a macro for let so we don’t have to write #%let when we use the DSL.
(nonterminaltyped-expr#'(:et))(:e:typed-exprt:type)... )
This is called a rewrite production. We have a DSL form, :, for type annotations like (:1Number). We add a rewrite production to allow infix use like (1:Number) for better readability. The first part of a rewrite production is a syntax/parse pattern and the second part is the DSL form that the source syntax should transform into. The pattern cannot refer to binding classes, nonterminals, etc.
We have another rewrite production that converts function applications to #%app forms. It is important that this comes after the type annotation rewrite. Otherwise, infix usages would be treated as function applications.
In general, it is a good idea to tag most forms in your grammar like #%app to make your compiler less bug-prone. It also allows us to rely on datum literals for distinguishing between forms, which is useful when your form names are in a special binding space.
Now let’s define infer-expr-type:
[n:number(number-type)][x:id(get-identifier-type#'x)][targ-types])(extend-type-environment!xt))(function-typearg-typesbody-type)][(function-typeexpected-arg-typesreturn-type)(check-expr-typeargexpected-type))return-type](type->datumf-type))#'f)])](check-expr-type#'et)t](extend-type-environment!x(infer-expr-typee)))(infer-expr-type#'body)]))(type->datumexpected-type)(type->datumactual-type))e)))(parse-type#'return-type))]))[(number-type)'Number][(function-typearg-typesreturn-type)
We use prefab structs for our compile-time representation of types and we have a persistent symbol table mapping identifiers to types. A persistent symbol table allows information like an identifier’s type to be used between modules even if the providing module has been compiled. Eventually, we’ll add definitions to our language, so when type-checking a module that requires a typed identifier, we’ll need the identifier’s type from the persistent symbol table.
We have to use prefab structs because persistent symbol tables can’t persist non-prefab structs. The only values allowed in a symbol table are those which satisfy the syntax-datum? predicate.
extend-type-environment! uses a bit of a hack. By default, symbol tables error when setting an identifier’s value after it has already been set. We will end up re-inferring an expression’s type later on, so we use this hack to only set the type if it isn’t already set.
The rest is a typical type checker, nothing syntax-spec-specific.
Now let’s implement our compiler:
Nothing special here, it’s a straightforward translation to Racket. We pretty much just throw away the types.
Finally, we can write macros for let and lambda :
Right now, these don’t need to be macros. But when we add definitions, We will desugar multi-body let and lambda expressions to single-body ones.
Now we can run some programs:
> (stlc/infer1)'Number
> (stlc/expr1)1
'(-> Number Number)
#<procedure:...lambda-calculus.rkt:221:7>
Let’s add arbitrary Racket expressions to our language. These can evaluate to anything, so we can’t infer their types. We can require the user to annotate the type, but we shouldn’t just trust that the type is correct. Instead, we should add a contract check to ensure that the annotation is accurate.
We also need to add a contract check in the other direction, even if we don’t allow arbitrary Racket expressions. Let’s consider a program in our language:
It evaluates to a function which takes in a function and a number and applies the function to a number. But stlc/expr gives us a raw procedure that we can pass anything into!
"not a function"1)
This produces a runtime type error from inside the typed code! This should be impossible. And if we allow DSL variables to be referenced in Racket expressions, we’ll need to insert contract checks on references to make sure they’re used properly. We can do this by creating a custom reference compiler.
Let’s do it!
(binding-classtyped-var#:reference-compilertyped-var-reference-compiler)(nonterminaltyped-expr... )(host-interface/expression(stlc/expre:typed-expr)#'(compile-expr/topet))... )(parse-type#'t)]#'(compile-expre))e^'stlc'racket#f#'e^)]))#,x'stlc'racket'#,x#'#,x))))[(function-typearg-typesreturn-type)e'racket'stlc#f#'e)]))
We added a new form to our language, rkt, which contains a racket expression and a type annotation. The compilation of this experssion involves a contract check to make sure the value is of the expected type. We also added a contract check in the other direction when a typed value flows out of the host interface and created a custom reference compiler using make-variable-like-reference-compiler which inserts a contract check when a DSL variable is referenced in racket. These contract checks ensure typed values (particularly procedures) are used properly in untyped code.
This implementation is far from efficient. Instead of generating the syntax for a contract check everywhere, we should defer to a runtime function and have the type flow into the runtime since it’s a prefab struct. We should also avoid inserting a contract check every time a DSL variable is referenced in Racket and just do it once per variable. But for this tutorial, we’ll keep it simple.
Let’s run some example programs now:
3
> (stlc/expr(rkt"not a number":Number))broke its own contract
promised: number?
produced: "not a number"
in: number?
contract from: racket
blaming: racket
(assuming the contract is correct)
at: eval:3:0
broke its own contract
promised: number?
produced: #t
in: the range of
(-> number? number? number?)
contract from: racket
blaming: racket
(assuming the contract is correct)
at: eval:4:0
contract violation
expected: a procedure
given: "not a function"
in: the 1st argument of
(-> (-> number? number?) number? number?)
contract from: stlc
blaming: racket
(assuming the contract is correct)
at: <pkgs>/syntax-spec-v3/tests/dsls/simply-typed-lambda-c
alculus.rkt:288:39
> (stlc/expr(rkt(app"not a function"1):Number)))app: contract violation
expected: a procedure
given: "not a function"
in: the 1st argument of
(-> (-> number? number?) number? number?)
contract from: stlc
blaming: racket
(assuming the contract is correct)
at: eval:6:0
Our contract checks protect typed-untyped interactions.
Next, let’s add definitions to our language:
(nonterminaltyped-expr... )(nonterminal/exportingtyped-definition-or-expr#:allow-extensiontyped-macro#:binding-spacestlc(#%definex:typed-vart:typee:typed-expr)e:typed-expr)(host-interface/definitions
We added a new nonterminal for forms that can be used in a definition context. Since definitions inside of a begin should spliced in to the surrounding definition context, we use the binding rule re-export, which we haven’t seen yet. As the name suggests, it takes all exported names from an exporting nonterminal sub-expression and re-exports them. Here is an example of this splicing in regular Racket:
3
We also added the block form to our expression nonterminal so we can use definitions in expressions. To make the bindings from the definitions accessible within the block form, we use scope and import .
To support top-level definitions, we added a new host interface using host-interface/definitions, which we’ve never seen before. This defines a special type of host interface that can only be used in a definition context. This type of host interface can be used to define module-level variables that can be used with provide and require . Now that this is possible, it is important that we’re using a persistent symbol table to store type information.
Now let’s update the rest of our code:
(infer-expr-type#'e)](extend-type-environment!#'x(parse-type#'t))](type-check-defn-or-expr/pass1body))](check-expr-type#'e(parse-type#'t))](type-check-defn-or-expr/pass2body))]#'(compile-expre))#'e^e^'stlc'racket#f#'e))]))(compile-defn-or-exprd)(compile-expre))]))#`(compile-expr/tope#,(infer-expr-type#'e)#t)]))#'(compile-expre)]))#'(#%definexte)]body
To type-check a group of definitions, we must take two passes. The first pass must record the type information of all defined identifiers, and the second pass checks the types of the bodies of definitions. Since mutual recursion is possible, we need the types of all identifiers before we can start checking the types of definition bodies which may reference variables before their definitions. This is a common pattern when working with mutually recursive definition contexts in general.
We added an optional flag to disable the contract check for compile-expr/top when compiling top-level definitions since it is unnecessary.
We also added support for multi-body let , lambda , and let* , and we added a macro around #%define for syntactic sugar.
Let’s run it!
> (stlc2)3))> (stlc/expr(addtwothree))5