Beautiful Racket: Master recipe

Beautiful Racket / appendix


Within your path/to/source, create a project directory for the language called dsl. Switch into the new directory and use raco pkg install to add it to your local Racket installation as a package.

cd path/to/source
mkdir dsl
cd dsl
raco pkg install
1 2 3 4
> cd path/to/source > mkdir dsl > cd dsl > raco pkg install

Alternatively: at the command line, switch to your path/to/source and use raco pkg new dsl, which will create the dsl subdirectory and stub out other files that will be useful in the package (including documentation and configuration).

cd path/to/source
raco pkg new dsl
cd dsl
raco pkg install
1 2 3 4
> cd path/to/source > raco pkg new dsl > cd dsl > raco pkg install


If your language is invoked as #lang dsl, its boot module will be "dsl/main.rkt".

If your language is invoked as #lang dsl/dialect, its boot module will be "dsl/dialect.rkt". (Caution: the boot module will not be "dsl/dialect/main.rkt".)

A language can have any number of dialects living in the same project directory, with the boot modules following this naming convention.


As the first step in running any program, Racket invokes the reader for the language. The reader converts source code (= a string of characters in a source file) into S-expressions (= Racket-style syntactic forms).

To start the reader, Racket calls the read-syntax function for the language. Racket looks for this function in the reader submodule of the boot module of the #lang, which must provide it. For instance:
dsl/main.rkt

#lang br
(module reader br
(provide read-syntax)
···)
1 2 3 4
#lang br (module reader br (provide read-syntax) ···)

read-syntax can either be defined within the reader submodule:
dsl/main.rkt

#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    ···))
1 2 3 4 5
#lang br (module reader br (provide read-syntax) (define (read-syntax name port) ···))

Or imported from elsewhere:
dsl/main.rkt

#lang br
(module reader br
(provide read-syntax)
(require module/that/exports/read-syntax))
1 2 3 4
#lang br (module reader br (provide read-syntax) (require module/that/exports/read-syntax))

read-syntax must accept two input arguments: a source name (that holds the location of the source—e.g., for file input, the name would be a path) and an input port (that points at the source file). read-syntax should read its data from the port. + In principle, it’s possible to read directly from a source name when it’s a path. But this is unreliable, because it assumes the input source is a file. Maybe it’s not. By contrast, the port argument always contains the source data, regardless of the underlying input type. read-syntax should consume all the data from the port (that is, until the port returns eof). In pseudocode:
dsl/main.rkt

#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    (define s-exprs (read-code-from port))
    ···))
1 2 3 4 5 6
#lang br (module reader br (provide read-syntax) (define (read-syntax name port) (define s-exprs (read-code-from port)) ···))

dsl/main.rkt


Racket takes the module expression returned from read-syntax and uses it to replace the code in the source file. So a source file that looks like this:
source-file.rkt

#lang dsl
dsl source code;
···
1 2 3
#lang dsl dsl source code; ···

Becomes this:
source-file.rkt

(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

Evaluation of this module expression continues from here, starting by importing bindings from the expander.


The tokenizer reads characters from the input port and converts them to tokens, which are the smallest meaningful units of source code.

You can write a tokenizer by hand. You can also use a helper function called a lexer that uses regexp-style rules to convert source code into tokens.

These tokens are passed to the parser, which arranges them into a hierarchical S-expression called a parse tree.

You can write a parser by hand. You can also use a grammar-based parser generator, like brag, to make a parser function.

dsl/main.rkt

#lang br
(module reader br
  (require module/that/provides/parse
           module/that/provides/tokenize)
  (provide read-syntax)
  (define (read-syntax name port)
    (define the-parse-tree (parse (tokenize port)))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,the-parse-tree))))
#lang br
(module reader br
  (require module/that/provides/parse
           module/that/provides/tokenize)
  (provide read-syntax)
  (define (read-syntax name port)
    (define the-parse-tree (parse (tokenize port)))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,the-parse-tree))))


Once the reader finishes, the expander starts. At the end of the reader phase, the source code has been converted into a module expression that contains S-expressions, but has no bindings, e.g.—
source-file.rkt

(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

The expander provides the initial set of bindings for the module expression returned by the reader, thereby determining the meaning of the identifiers within the S-expressions. This, in turn, allows the expressions to be evaluated as Racket code.

Any module (that meets the other requirements below) can be used as the expander for a language. It’s invoked by the first line of the module code returned by the reader. For instance, this module expression will rely on dsl/expander as its expander:
source-file.rkt

(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

In essence, the expander name is like an implied require at the beginning of the module:

(module dsl-mod-name dsl/expander
(require dsl/expander)
dsl-sexprs ···)
1 2 3
(module dsl-mod-name dsl/expander (require dsl/expander) dsl-sexprs ···)

Most Racket languages have their own custom expander module. But it’s not mandatory.

Every expander must provide a #%module-begin macro, which will be the first thing invoked in the expander. #%module-begin must accept as input all the expressions that appear in the body of the module expression that read-syntax makes. Therefore, it’s not a bad idea to build your #%module-begin around a syntax pattern that accepts any number of input arguments.

To evaluate the module expression returned by the reader, Racket imports the #%module-begin from the expander specified in the module expression. It replaces the module expression with a call to this #%module-begin, passing it all the parsed expressions that are in the body of the module expression. So this:
source-file.rkt

(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

Becomes this:
source-file.rkt

(module dsl-mod-name dsl/expander
(#%module-begin ;; imported from `dsl/expander`
dsl-sexprs ···))
1 2 3
(module dsl-mod-name dsl/expander (#%module-begin ;; imported from `dsl/expander` dsl-sexprs ···))

Often, the #%module-begin for a language will perform some language-specific processing on the parse tree, and call the #%module-begin in the implementation language. To prevent namespace collisions between the two #%module-begin macros, use rename-out. In pseudocode:
dsl/expander.rkt

#lang br
(define-macro (dsl-module-begin PARSED-EXPR ...)
#'(#%module-begin ;; from `br`
PARSED-EXPR ...))
(provide (rename-out [dsl-module-begin #%module-begin]))
1 2 3 4 5
#lang br (define-macro (dsl-module-begin PARSED-EXPR ...) #'(#%module-begin ;; from `br` PARSED-EXPR ...)) (provide (rename-out [dsl-module-begin #%module-begin]))

Optionally, an expander can provide certain interposition points:
- 
  #%top-interaction is used to activate the REPL.
- 
  #%app adds support for function calls.
- 
  #%datum adds support for self-evaluating values (like numbers and strings).
- 
  #%top adds support for missing identifiers.


Writing unit tests.

Integrating the language with DrRacket, including syntax coloring and indenting.

Adding Scribble documentation.

Adding an "info.rkt" file.

Making the language available through the Racket package server.

Beautiful Racket / appendix

Master recipe

Setting up a language project

The boot module

The reader

Optional: the parser & tokenizer

The expander

Optional tasks

Beau­tiful Racket / appendix

Master recipe

Setting up a language project

The boot module

The reader

Optional: the parser & tokenizer

The expander

Optional tasks

Beautiful Racket / appendix