Beautiful Racket: Master recipe

Beautiful Racket / appendix


Within your path/to/source, create a project directory for the language called dsl. Switch into the new directory and use raco pkg install to add it to your local Racket installation as a package.
cd path/to/source
mkdir dsl
cd dsl
raco pkg install
1 2 3 4
> cd path/to/source > mkdir dsl > cd dsl > raco pkg install

Alternatively: at the command line, switch to your path/to/source and use raco pkg new dsl, which will create the dsl subdirectory and stub out other files that will be useful in the package (including documentation and configuration).
cd path/to/source
raco pkg new dsl
cd dsl
raco pkg install
1 2 3 4
> cd path/to/source > raco pkg new dsl > cd dsl > raco pkg install


If your language is invoked as #lang dsl, its boot module will be "dsl/main.rkt".

If your language is invoked as #lang dsl/dialect, its boot module will be "dsl/dialect.rkt". (Caution: the boot module will not be "dsl/dialect/main.rkt".)

A language can have any number of dialects living in the same project directory, with the boot modules following this naming convention.


As the first step in running any program, Racket invokes the reader for the language. The reader converts source code (= a string of characters in a source file) into S-expressions (= Racket-style syntactic forms).

To start the reader, Racket calls the read-syntax function for the language. Racket looks for this function in the reader submodule of the boot module of the #lang, which must provide it. For instance:
dsl/main.rkt
#lang br
(module reader br
(provide read-syntax)
···)
1 2 3 4
#lang br (module reader br (provide read-syntax) ···)

read-syntax can either be defined within the reader submodule:
dsl/main.rkt
#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    ···))
1 2 3 4 5
#lang br (module reader br (provide read-syntax) (define (read-syntax name port) ···))

Or imported from elsewhere:
dsl/main.rkt
#lang br
(module reader br
(provide read-syntax)
(require module/that/exports/read-syntax))
1 2 3 4
#lang br (module reader br (provide read-syntax) (require module/that/exports/read-syntax))

read-syntax must accept two input arguments: a source name (that holds the location of the source—e.g., for file input, the name would be a path) and an input port (that points at the source file). read-syntax should read its data from the port. + In principle, it’s possible to read directly from a source name when it’s a path. But this is unreliable, because it assumes the input source is a file. Maybe it’s not. By contrast, the port argument always contains the source data, regardless of the underlying input type. read-syntax should consume all the data from the port (that is, until the port returns eof). In pseudocode:
dsl/main.rkt
#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    (define s-exprs (read-code-from port))
    ···))
1 2 3 4 5 6
#lang br (module reader br (provide read-syntax) (define (read-syntax name port) (define s-exprs (read-code-from port)) ···))

dsl/main.rkt

#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    (define s-exprs (read-code-from port))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,@s-exprs))))
#lang br
(module reader br
  (provide read-syntax)
  (define (read-syntax name port)
    (define s-exprs (read-code-from port))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,@s-exprs))))


Racket takes the module expression returned from read-syntax and uses it to replace the code in the source file. So a source file that looks like this:
source-file.rkt
#lang dsl
dsl source code;
···
1 2 3
#lang dsl dsl source code; ···

Becomes this:
source-file.rkt
(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

Evaluation of this module expression continues from here, starting by importing bindings from the expander.


The tokenizer reads characters from the input port and converts them to tokens, which are the smallest meaningful units of source code.

You can write a tokenizer by hand. You can also use a helper function called a lexer that uses regexp-style rules to convert source code into tokens.

These tokens are passed to the parser, which arranges them into a hierarchical S-expression called a parse tree.

You can write a parser by hand. You can also use a grammar-based parser generator, like brag, to make a parser function.

dsl/main.rkt

#lang br
(module reader br
  (require module/that/provides/parse
           module/that/provides/tokenize)
  (provide read-syntax)
  (define (read-syntax name port)
    (define the-parse-tree (parse (tokenize port)))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,the-parse-tree))))
#lang br
(module reader br
  (require module/that/provides/parse
           module/that/provides/tokenize)
  (provide read-syntax)
  (define (read-syntax name port)
    (define the-parse-tree (parse (tokenize port)))
    (strip-bindings
     #`(module dsl-mod-name dsl/expander
         #,the-parse-tree))))


Once the reader finishes, the expander starts. At the end of the reader phase, the source code has been converted into a module expression that contains S-expressions, but has no bindings, e.g.—
source-file.rkt
(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

The expander provides the initial set of bindings for the module expression returned by the reader, thereby determining the meaning of the identifiers within the S-expressions. This, in turn, allows the expressions to be evaluated as Racket code.

Any module (that meets the other requirements below) can be used as the expander for a language. It’s invoked by the first line of the module code returned by the reader. For instance, this module expression will rely on dsl/expander as its expander:
source-file.rkt
(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

In essence, the expander name is like an implied require at the beginning of the module:
(module dsl-mod-name dsl/expander
(require dsl/expander)
dsl-sexprs ···)
1 2 3
(module dsl-mod-name dsl/expander (require dsl/expander) dsl-sexprs ···)

Most Racket languages have their own custom expander module. But it’s not mandatory.

Every expander must provide a #%module-begin macro, which will be the first thing invoked in the expander. #%module-begin must accept as input all the expressions that appear in the body of the module expression that read-syntax makes. Therefore, it’s not a bad idea to build your #%module-begin around a syntax pattern that accepts any number of input arguments.

To evaluate the module expression returned by the reader, Racket imports the #%module-begin from the expander specified in the module expression. It replaces the module expression with a call to this #%module-begin, passing it all the parsed expressions that are in the body of the module expression. So this:
source-file.rkt
(module dsl-mod-name dsl/expander
dsl-sexprs ···)
1 2
(module dsl-mod-name dsl/expander dsl-sexprs ···)

Becomes this:
source-file.rkt
(module dsl-mod-name dsl/expander
(#%module-begin ;; imported from `dsl/expander`
dsl-sexprs ···))
1 2 3
(module dsl-mod-name dsl/expander (#%module-begin ;; imported from `dsl/expander` dsl-sexprs ···))

Often, the #%module-begin for a language will perform some language-specific processing on the parse tree, and call the #%module-begin in the implementation language. To prevent namespace collisions between the two #%module-begin macros, use rename-out. In pseudocode:
dsl/expander.rkt
#lang br
(define-macro (dsl-module-begin PARSED-EXPR ...)
#'(#%module-begin ;; from `br`
PARSED-EXPR ...))
(provide (rename-out [dsl-module-begin #%module-begin]))
1 2 3 4 5
#lang br (define-macro (dsl-module-begin PARSED-EXPR ...) #'(#%module-begin ;; from `br` PARSED-EXPR ...)) (provide (rename-out [dsl-module-begin #%module-begin]))

Optionally, an expander can provide certain interposition points:
- 
  #%top-interaction is used to activate the REPL.
- 
  #%app adds support for function calls.
- 
  #%datum adds support for self-evaluating values (like numbers and strings).
- 
  #%top adds support for missing identifiers.


Writing unit tests.

Integrating the language with DrRacket, including syntax coloring and indenting.

Adding Scribble documentation.

Adding an "info.rkt" file.

Making the language available through the Racket package server.

Beautiful Racket / appendix

Master recipe

Setting up a language project

The boot module

The reader

Optional: the parser & tokenizer

The expander

Optional tasks

Beau­tiful Racket / appendix

Master recipe

Setting up a language project

The boot module

The reader

Optional: the parser & tokenizer

The expander

Optional tasks

Beautiful Racket / appendix