This is a summary of the major steps in creating a new language in Racket. This recipe is not “master” in the sense of “comprehensive.” Rather, it’s a core process that you can customize as you wish—except for a few points that are non-negotiable, which are signaled with the word must.
This recipe assumes familiarity with Racket’s language-making workflow and terminology. If you want to learn about that workflow, start with the tutorials, not here.
Let’s assume your new language is called dsl. The rest of the recipe will use #lang dsl as an example. But the process is analogous for any #lang dsl/dialect.
The pseudocode uses #lang br. But you can use any Racket language to implement your language, as long your code follows the other requirements set out below.
When working on a language project, it’s convenient to be able to refer to modules within the package by global names like dsl/module-name, and on the #lang line as #lang dsl. To do this, start by installing your language project as a .
Within your path/to/source, create a project directory for the language called dsl. Switch into the new directory and use raco pkg install to add it to your local Racket installation as a package.
1 2 3 4
> cd path/to/source > mkdir dsl > cd dsl > raco pkg install
Alternatively: at the command line, switch to your path/to/source and use raco pkg new dsl, which will create the dsl subdirectory and stub out other files that will be useful in the package (including documentation and configuration).
1 2 3 4
> cd path/to/source > raco pkg new dsl > cd dsl > raco pkg install
The boot module is the first module Racket loads when running a program in your language.
If your language is invoked as #lang dsl, its boot module will be "dsl/main.rkt".
If your language is invoked as #lang dsl/dialect, its boot module will be "dsl/dialect.rkt". (Caution: the boot module will not be "dsl/dialect/main.rkt".)
A language can have any number of dialects living in the same project directory, with the boot modules following this naming convention.
To start the reader, Racket calls the read-syntax function for the language. Racket looks for this function in the reader submodule of the boot module of the #lang, which must provide it. For instance:
read-syntax can either be defined within the reader submodule:
Or imported from elsewhere:
read-syntax must accept two input arguments: a source name (which will hold the location of the source—e.g., for file input, the name would be a path) and an input (pointing at the source file). read-syntax should read its data from the port. + In principle, it’s possible to read directly from a source name when it’s a path. But this is unreliable, because it assumes the input source is a file. Maybe it’s not. By contrast, the port argument always contains the source data, regardless of the underlying input type. In pseudocode:
read-syntax must return one value: code for a expression, represented as a . Typically, the converted S-expressions from the source file are inserted into this syntax object. This syntax object must have no identifier . This module code must include a reference to the that will provide the initial set of bindings when the module code is evaluated. In pseudocode:
1 2 3 4 5 6 7 8
Racket takes the module expression returned from read-syntax and uses it to replace the code in the source file. So a source file that looks like this:
1 2 3
#lang dsl dsl source code; ···
Evaluation of this module expression continues from here, starting by importing bindings from the expander.
It’s common, but not mandatory, for read-syntax to rely on two helper functions: a parser and a tokenizer.
You can write a parser by hand. You can also use a grammar-based generator, like brag, to make a parser function.
Together, in pseudocode:
1 2 3 4 5 6 7 8 9 10
Once the reader finishes, the expander starts. At the end of the reader phase, the source code has been converted into a module expression that contains S-expressions, but has no bindings, e.g.—
Any module (that meets the other requirements below) can be used as the expander for a language. It’s invoked by the first line of the module code returned by the reader. For instance, this module expression will rely on dsl/expander as its expander:
In essence, the expander name is like an implied require at the beginning of the module:
Most Racket languages have their own custom expander module. But it’s not mandatory.
Every expander must provide a #%module-begin macro, which will be the first thing invoked in the expander. #%module-begin must accept as input all the expressions that appear in the body of the module expression that read-syntax makes. Therefore, it’s not a bad idea to build your #%module-begin around a syntax pattern that accepts any number of input arguments.
To evaluate the module expression returned by the reader, Racket imports the #%module-begin from the expander specified in the module expression. It replaces the module expression with a call to this #%module-begin, passing it all the parsed expressions that are in the body of the module expression. So this:
(#%module-begin ;; imported from `dsl/expander` dsl-sexprs ···)
Often, the #%module-begin for a language will perform some language-specific processing on the parse tree, and call the #%module-begin in the implementation language. To prevent namespace collisions between the two #%module-begin macros, use rename-out. In pseudocode:
1 2 3 4 5
Optionally, an expander can provide certain implicit forms:
#%top-interaction is used to activate the REPL.
#%app adds support for function calls.
#%datum adds support for self-evaluating values (like numbers and strings).
#%top adds support for missing identifiers.
The br/quicklang dialect automatically exports these macros.
Writing unit tests.
Integrating the language with DrRacket, including syntax coloring and indenting.
Adding Scribble documentation.
Adding an "info.rkt" file.
Making the language available through the Racket package server.