Beau­ti­ful Racket / Chap­ter 2

Make your first programming language — in one hour

This is an excerpt from a book I’m currently writing about making programming languages with Racket. Click on any paragraph to report a problem or send me a comment. Of course, this is all being done with Pollen.
— Matthew Butterick

Let’s make stacker, a pro­gram­ming lan­guage that imple­ments a stack-based cal­cu­la­tor.

1
2
3
4
5
6
#lang reader br/demo/stacker
push 4
push 8
+
push 3
*
1
36

The lan­guage relies on a stack of argu­ments, which starts out empty. The push com­mand puts a new argu­ment on the top of the stack. An oper­a­tor — either + or * — takes the top two argu­ments from the stack, and replaces them with the result of apply­ing the oper­a­tor (mean­ing first + second or first * second). So this sam­ple pro­gram means 3 * (8 + 4), which is 36.

Even though stacker is sim­ple, it will let us tour the whole lan­guage-mak­ing work­flow in Racket. Every lan­guage project will fol­low the same basic steps.

Pre­req­ui­sites

Install Racket and the beautiful-racket pack­age as described in Setup. Though you can work through this tuto­r­ial with any code edi­tor, I rec­om­mend — and will assume you’re using — Racket’s graph­i­cal edi­tor, DrRacket.

Some gen­eral pro­gram­ming expe­ri­ence is assumed. No spe­cific expe­ri­ence with Racket, Scheme, or Lisp is assumed. Don’t worry if some of the ter­mi­nol­ogy or con­cepts go over your head — all the impor­tant top­ics will be reit­er­ated later.

What is a pro­gram­ming lan­guage?

On a tech­ni­cal level, a pro­gram­ming lan­guage is just another pro­gram. It takes cer­tain input, then eval­u­ates that input to pro­duce a result.

On a design level, a pro­gram­ming lan­guage is a spec­i­fi­ca­tion for what kind of input is accept­able, how that input should be inter­preted, and what kind of results should be gen­er­ated. In that regard, what makes a pro­gram­ming lan­guage dif­fer­ent from an ordi­nary pro­gram is that it offers the widest hori­zon of abstrac­tion.

On a cul­tural level, a pro­gram­ming lan­guage is a way to express our evolv­ing ideas and dis­cov­er­ies about pro­gram­ming gen­er­ally — logic, per­for­mance, ergonom­ics, and so on. Though a pro­gram­ming lan­guage is a way of writ­ing a pro­gram, it’s also a way of think­ing about a pro­gram.

Because if lan­guages were just about writ­ing pro­grams, we could’ve stopped with C. (And some have.) But what would be the fun in that? Pro­grams are inter­est­ing specif­i­cally because they’re mal­leable. And the more we expect out of pro­grams, the more vital it is to explore new ways of mak­ing pro­grams.

Includ­ing new lan­guages.

By the way, does a pro­gram­ming lan­guage need to be large & gen­eral-pur­pose, like C? Not at all. A domain-spe­cific lan­guage (or DSL) is a “lit­tle lan­guage” — opti­mized to han­dle a spe­cific task, and noth­ing else. Some­times DSLs are designed to be used inside larger lan­guages — e.g,. reg­u­lar expres­sions, SQL. Some­times they stand alone — e.g., CSS/HTML, Post­Script, R, TeX. Many Unix-descended tools also qual­ify as DSLs — e.g., awk, bash, lex/yacc, make.

Why would I make a pro­gram­ming lan­guage?

If you think pro­gram­ming should fill your brain and soul with feel­ings of power, cre­ativ­ity, curios­ity, and fun — then you’ll prob­a­bly like mak­ing pro­gram­ming lan­guages.

If not, you can move along. (No hard feel­ings.)

But in terms of prac­ti­cal ben­e­fits:

Of course, a pro­gram­ming lan­guage isn’t the right solu­tion for every prob­lem. But when cre­at­ing a lan­guage is easy and inex­pen­sive — as the rest of this tuto­r­ial will show you — it becomes a real­is­tic option for many more prob­lems.

How are lan­guages imple­mented in Racket?

Racket is a descen­dant of Lisp and Scheme that pro­vides a high-level inter­face for mak­ing new lan­guages.

The basic idea is to take a pro­gram writ­ten in the new lan­guage and con­vert its sur­face syn­tax and seman­tics to a stan­dard Racket pro­gram. Then Racket runs this pro­gram nor­mally.

In that sense, every lan­guage in Racket is just an indi­rect way of writ­ing other Racket pro­grams (aka a source-to-source com­piler). This approach makes it pos­si­ble for a new lan­guage to imme­di­ately ben­e­fit from every­thing in Racket’s tool­chain: the libraries, the cross-plat­form com­pat­i­bil­ity, the DrRacket IDE, and so on.

Because every lan­guage is itself just a pro­gram, a lan­guage can do any­thing a pro­gram can. So you can make lan­guages that behave like tra­di­tional pro­gram­ming lan­guages. Or lan­guages that behave in uncon­ven­tional ways.

And why not? Mak­ing a pro­gram­ming lan­guage used to be a years-long com­mit­ment. Now you can make one in min­utes. (To be fair, this first one will be a lit­tle slower. But you’ll get the hang of it.)

The struc­ture of a lan­guage

Every lan­guage in Racket must pro­vide two func­tions:

  1. A reader, which con­verts the source file from a string of char­ac­ters into Racket-style paren­the­sized forms, also known as S-expres­sions. (We’ll see how that works in a minute.)

  2. An expander, which deter­mines how the reader forms cor­re­spond to real Racket expres­sions, which are then eval­u­ated to pro­duce a result. (We’ll see why it’s called an expander later on.)

When you make a lan­guage, you’ll always have to spec­ify a reader and an expander. Some­times you’ll write them from scratch. Some­times you’ll import them from else­where.

But for stacker, we’ll write both the reader and expander. In some projects, the expander is more involved, so it can make sense to start there and work back­ward to the reader. But for this project, they’re equally sim­ple. Thus, we’ll start with the reader, so we get a bet­ter sense of the pro­gram flow, start to fin­ish.

Boot­strap­ping your lan­guage

Every Racket source file begins with a #lang line. If you’ve explored Racket you might have seen vari­ants like:

1
2
3
4
#lang racket
#lang racket/base
#lang scribble/doc
#lang datalog

We just learned that the first step in inter­pret­ing a lan­guage is to process the source with a reader. The #lang line’s job is to tell Racket where to find a reader for the source file. The reader, in turn, will tell Racket where to find the expander.

Often, the reader and expander for a lan­guage are stored in dif­fer­ent source files. But they don’t have to be. For this project, we’ll just use one file, called stacker.rkt. (There’s no magic in the file­name. You can call it what­ever you want.)

To use the abbre­vi­ated #lang stacker nota­tion, we’ll have to do some extra house­keep­ing that we’ll cover in the next tuto­r­ial.

For now, we’ll use the spe­cial boot­strap #lang reader module nota­tion, like so:

1
#lang reader "stacker.rkt"

This is the same as the nota­tion above, but allows us to use the path to the source file (aka mod­ule) that con­tains our reader.

Racket basics

We need to write some Racket code. So let’s learn some Racket.

Beyond that, Racket has the usual num­bers, strings, and func­tions that you know from other lan­guages. We’ll han­dle those as we encounter them.

Set­ting up our source files

Open DrRacket. Start a new file called stacker.rkt. We’ll write this file using a spe­cial tuto­r­ial lan­guage called br. (br comes from the beautiful-racket pack­age we installed. It’s Racket, but with some extra con­ve­niences.) For test­ing pur­poses, let’s add a sec­ond line, so your whole file looks like this:

stacker.rkt
1
2
#lang br
42

Click Run in DrRacket (or learn the key short­cut, because you’ll be using it a lot). In the inter­ac­tions win­dow, you’ll see the result of run­ning the file:

1
42

In the same direc­tory, cre­ate a sec­ond file that we’ll use to test our new lan­guage as we go:

stacker-test.rkt
1
#lang reader "stacker.rkt"

Run this file too. You should get an awful-look­ing error like read-syntax: cannot find reader for `#lang reader "stacker.rkt". If so, that’s the right result: we’re telling #lang to get a reader out of "stacker.rkt", but we haven’t made one yet.

So let’s do that.

Design­ing our reader

Recall the pur­pose of the reader:

Before plung­ing into the code, we should visu­al­ize what our reader needs to accom­plish. Our stacker lan­guage will ulti­mately look like this:

1
2
3
4
5
6
#lang reader "stacker.rkt"
push 4
push 8
+
push 3
*

These, how­ever, are not paren­the­sized forms. How can we get there? We notice that our lan­guage has one instruc­tion on each line. Thus, the sim­plest idea — never a bad place to start — would be to just wrap paren­the­ses around each line:

1
2
3
4
5
(push 4)
(push 8)
(+)
(push 3)
(*)

Not bad. These now look like Racket expres­sions. If we code up a func­tion called push, these expres­sions would eval­u­ate cor­rectly.

The prob­lem comes with the oper­a­tors, which become expres­sions that look like (+) and (*). That paren­the­siz­ing means these func­tions would be eval­u­ated with­out any argu­ments. That would be wrong — they’re sup­posed to take their input argu­ments from the top of the stack. (You’re wel­come to run (+) or (*) in a sep­a­rate DrRacket win­dow to see what you get.)

We need to make it pos­si­ble to man­age the inter­ac­tion of our stack and our oper­a­tors. So let’s imag­ine that we have an inter­me­di­ate func­tion — call it dispatch — that will inspect each instruc­tion and decide what to do. Then we can wrap each instruc­tion like so:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

What have we accom­plished? Most impor­tantly, our oper­a­tors won’t be treated as func­tion calls, but rather as argu­ments to another func­tion. (dispatch +) means “call dispatch with + as the argu­ment.” In that case, dispatch can get the argu­ments off the stack and apply the oper­a­tion. Like­wise for (dispatch push 4) — dispatch can just treat it like it were (push 4). There­fore, as long as we can write a dispatch func­tion for our expander that works this way — trust me, we can — then this reader trans­for­ma­tion will suf­fice.

Curi­ous char­ac­ters might won­der if we could make a reader that treated the two types of instruc­tions dif­fer­ently:

1
2
3
4
5
(push 4)
(push 8)
(dispatch +)
(push 3)
(dispatch *)

Yes, we could. But for this tuto­r­ial, we’re going to use the sim­plest pos­si­ble reader, and wrap every­thing with dispatch. We’ll reserve all the other work for the expander. In your own lan­guages, you can choose how to allo­cate respon­si­bil­i­ties between the reader and expander.

To sum­ma­rize — our reader will con­vert code that looks like this:

1
2
3
4
5
push 4
push 8
+
push 3
*

Into this:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

Now that we know what our reader needs to do, we can imple­ment it.

Pro­gram­ming our reader: out­put

By con­ven­tion, Racket always expects the main reader func­tion to be called read-syntax. Racket will call this func­tion with two argu­ments: the path to the source file, and a port for read­ing from the file. But for now, let’s set these aside and focus on the out­put of the func­tion.

read-syntax has one job: to return a syn­tax object describ­ing a mod­ule. The mod­ule, in turn, will invoke the expander for our lan­guage, trig­ger­ing the full expan­sion and eval­u­a­tion of the input pro­gram.

Let’s see how this works by look­ing at some code.

stacker.rkt
1
2
3
4
5
#lang br
(define (read-syntax source-path input-port)
  #'(module lucy br
      42))
(provide read-syntax)

To cre­ate a func­tion, we use define. (define (read-syntax source-path input-port) ...) defines a func­tion called read-syntax that accepts two argu­ments. These are posi­tional argu­ments, so we can name them what­ever we like. To make read-syntax avail­able out­side the file, we use provide. (In Racket, all def­i­n­i­tions within a source file are pri­vate by default.)

In the body of our func­tion we see our mod­ule syn­tax. (Racket has no return state­ment — a func­tion sim­ply returns the last expres­sion in its body.) In Racket, a mod­ule is the basic orga­ni­za­tional unit for code. Like every­thing else in Racket, a mod­ule is an expres­sion. A mod­ule expres­sion has this form:

1
2
(module name-of-module language-that-will-provide-the-expander
   expressions-to-evaluate ...)

So our mod­ule expres­sion:

1
2
(module lucy br
    42)

Means “a mod­ule named lucy, using the expander from the br lan­guage, that eval­u­ates the expres­sion 42.”

Then we need to turn this into a syn­tax object. A syn­tax object is a way of stor­ing a ref­er­ence to the lit­eral code of an expres­sion, so it can be eval­u­ated later. We can turn any expres­sion into a syn­tax object by wrap­ping it in the syntax con­verter:

1
2
(syntax (module untitled br
          42))

Syn­tax objects are so com­mon in Racket that there’s a short­hand nota­tion for mak­ing them — just add the #' pre­fix (pro­nounced hash-quote) to the expres­sion. The idiomatic way of writ­ing the above syn­tax object would be:

1
2
#'(module untitled br
    42)

“This syn­tax object will be eval­u­ated later — but when?” When we invoke the lan­guage. Save "stacker.rkt". Run "stacker-test.rkt". This time, instead of an error, you should get a dif­fer­ent result:

1
42

What’s hap­pen­ing here? When we run "stacker-test.rkt", we’re invok­ing our lan­guage, which means call­ing our reader func­tion, read-syntax. The con­tents of the "stacker-test.rkt" source file are replaced with the result of this func­tion, which is a syn­tax object describ­ing a mod­ule. Once that syn­tax has been moved into the new loca­tion, it’s eval­u­ated, pro­duc­ing 42.

Let’s per­suade our­selves that noth­ing spooky has hap­pened. We can accom­plish the same trans­for­ma­tion by hand. We replace the #lang line in "stacker-test.rkt" with the mod­ule expres­sion (this time with­out the #' syn­tax pre­fix, because we want the code to be eval­u­ated), and run it:

stacker-test.rkt
1
2
(module untitled br
    42)

Once again we get 42.

This proved that we can make a round-trip from a source file to a reader and back. Our reader works, except that it’s not, you know, read­ing any­thing. So let’s fix that.

Pro­gram­ming our reader: input

We’ll upgrade our reader to do three new tasks:

  1. Read in the lines of the source file.

  2. Con­vert these lines to (dispatch ...) form.

  3. Put these new forms into the mod­ule we’re return­ing as a syn­tax object.

To do this, let’s swap out the code in "stacker.rkt":

stacker.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#lang br
(define (read-syntax source-path input-port)
  (define src-strs (remove-blank-lines (port->lines input-port)))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (strip-context
   (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                  #'(module stacker-mod br
                      _SRC-EXPR ...))))
(provide read-syntax)

We’ll step through each line to see what it does.

1
(define src-strs (remove-blank-lines (port->lines input-port)))

Recall that Racket passes two argu­ments to our func­tion: the path to the source file, which we’ve named source-path, and an input port point­ing at this file, which we’ve named input-port. A port is a generic inter­face for retriev­ing data from a file or other source. Using port->lines, we’ll retrieve all the lines from the file as strings, and store them in src-strs. (In this project, we won’t need source-path for any­thing.) We’ll also remove-blank-lines because they won’t mean any­thing in stacker.

Next, we’ll con­vert these strings into datums. A datum is a string that’s been parsed as a Racket form, but not yet eval­u­ated. For instance, these are strings:

1
"(list 1 2 3)" "#t" "hello" "(+ 1 2)"

And these are the cor­re­spond­ing datums (I know the plural of datum ought to be data, but allow me some gram­mat­i­cal license, for clar­ity):

1
'(list 1 2 3)  #t  'hello  '(+ 1 2)

To cre­ate a datum from an exist­ing paren­the­sized expres­sion, we sim­ply wrap it with the quote con­verter:

1
2
(quote (module untitled br
          42))

Though as with syn­tax objects, there is an equiv­a­lent short­hand nota­tion — in this case, we pre­fix the expres­sion with ' (also pro­nounced quote):

1
2
'(module untitled br
    42)

If a datum sounds like a syn­tax object, that’s no coin­ci­dence — a syn­tax object is just a datum with some extra infor­ma­tion attached. The #' pre­fix nota­tion we used for a syn­tax object now con­nects log­i­cally: we can think of the # as rep­re­sent­ing the extra info that gets attached to the datum:

1
2
#'(module untitled br
    42)

Back to our code:

1
2
(define (make-datum str) (format-datum '(dispatch ~a) str))
(define src-exprs (map make-datum src-strs))

We want the strings we got from our source file to behave as expres­sions, so we’ll con­vert them to datums. Because we’re con­vert­ing a list of strings, we’ll use map to apply a con­ver­sion func­tion, make-datum, to every ele­ment in src-strs. This will cre­ate a new list of src-exprs. In turn, make-datum relies on format-datum to cre­ate a datum with the right form — '(dispatch ~a) (the value of str gets inter­po­lated into the ~a slot).

Finally, the twisti­est bit of code — but don’t panic:

1
2
3
4
(strip-context
 (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                #'(module stacker-mod br
                    _SRC-EXPR ...))))

Recall that the out­put of our func­tion is a syn­tax object describ­ing a mod­ule. We need to update this syn­tax object to incor­po­rate the datums we just made.

To do this, we’ll use inject-syntax to make new syn­tax avail­able within the mod­ule syn­tax. In the first line, we match our datums to a syn­tax pat­tern, #'(_SRC-EXPR ...), which is just a way of label­ing ele­ments so they can be rearranged as pieces of syn­tax. Here, the syn­tax pat­tern unpacks the ele­ments of src-exprs so we can refer to them indi­vid­u­ally.

This is eas­ier to under­stand if you look at how the syn­tax object has changed. _SRC-EXPR refers to the first item in the list. After that, the ... nota­tion means “do the same to every other item in the list.” So if src-exprs has three items, say (list '(dispatch 42) '(dispatch "Hello world") '(dispatch #t)), then the result­ing syn­tax object will look like this:

1
2
3
4
#'(module stacker-mod br
    (dispatch 42)
    (dispatch "Hello world")
    (dispatch #t))

Notice that the quote pre­fixes on our datums dis­ap­pear. That’s because a) inject-syntax auto­mat­i­cally upgrades our datums to syn­tax objects and then b) the syn­tax-object pre­fix #' on the front of the mod­ule expres­sion applies to every­thing in it.

As for the name _SRC-EXPR — when work­ing with syn­tax pat­terns, it’s help­ful to adopt a dis­tinct nam­ing con­ven­tion for pat­tern vari­ables. That way, it’s eas­ier to dis­tin­guish the syn­tax that will pass through as writ­ten from the syn­tax that will be replaced by the pat­tern. In this book, we’ll use iden­ti­fiers pre­fixed with an under­score to denote pat­tern vari­ables. As an addi­tional visual cue, we’ll use caps (because that makes the pat­tern vari­ables eas­ier to notice once they’re mixed in with ordi­nary iden­ti­fiers, which are con­ven­tion­ally all low­er­case).

Finally, we wrap our syn­tax object in strip-context. This is a bit of house­keep­ing that we’ll always do as the last step in the read-syntax, to guar­an­tee that we’re send­ing a clean syn­tax object to the expander. (As for what kind of “con­text” we’re strip­ping, more about that in a minute.)

1

The last line of code in our reader uses provide to make our read-syntax func­tion avail­able to other mod­ules.

Test­ing our reader

At this point, it would be nice to see if our updated reader works the way we hope. Let’s make a source file with these sam­ple val­ues and see if we do, in fact, get this code back. Save "stacker.rkt". Update "stacker-test.rkt" as fol­lows:

stacker-test.rkt
1
2
3
4
#lang reader "stacker.rkt"
42
"Hello world"
#t

Run it. What hap­pens? You should get an error that looks like dispatch: unbound identifier in module. In time, we’ll think of this as a good-news error — it means that our code is try­ing to call the dispatch func­tion (which is what we wanted) but can’t, because we haven’t writ­ten it yet (which shouldn’t sur­prise us).

In the mean­time, how can we check the reader func­tion? For debug­ging pur­poses, we can take a sneaky short­cut: add a ' pre­fix to the _SRC-EXPR, like so:

1
2
3
4
(strip-context
  (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                 #'(module stacker-mod br
                     '_SRC-EXPR ...)))

What will hap­pen now? Before we run our test file again, let’s rea­son through the code. There is a trap afoot for the unwary. Here are the two pos­si­bil­i­ties.

Option A: “Well, adding the ' pre­fix turns an expres­sion into a datum, which is the lit­eral code _SRC-EXPR. And if there are three lines of input, the ... means that datum will get repeated three times in the syn­tax object, result­ing in this—”

1
2
3
4
#'(module stacker-mod br
    '_SRC-EXPR
    '_SRC-EXPR
    '_SRC-EXPR)

Option B: “No, every­thing works the same way as before, except that every line now has a ' in front of it, like so—”

1
2
3
4
#'(module stacker-mod br
    '(dispatch 42)
    '(dispatch "Hello world")
    '(dispatch #t))

As we might hope, the answer is B. But why? Don’t for­get that the point of using a syn­tax object is to delay eval­u­a­tion of the code inside. While we’re using inject-syntax, the only parts of our syn­tax object that can change are the pieces ref­er­enced by our syn­tax pat­tern — namely, the vari­able _SRC-EXPR and the ... short­hand. The rest, includ­ing the ' pre­fix­ing, won’t be eval­u­ated until after the func­tion exits.

It may help to remem­ber that the ' pre­fix is equiv­a­lent to wrap­ping in quote like so:

1
2
3
4
(strip-context
  (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                 #'(module stacker-mod br
                     (quote _SRC-EXPR) ...)))

This makes it a lit­tle more plain that the quote will be repeated for every expres­sion in (_SRC-EXPR ...).

But then what will hap­pen? Run stacker-test.rkt and see for your­self:

1
2
3
'(dispatch 42)
'(dispatch "Hello world")
'(dispatch #t)

By using the ' pre­fix within the syn­tax object, we con­vert our code back into datums, which print as val­ues in DrRacket. Now we can ask: did the reader do what we hoped? Looks like it did.

To fin­ish our test, let’s change our source file to use the input from the very first exam­ple:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker.rkt"
push 4
push 8
+
push 3
*

Run it again and we get:

1
2
3
4
5
'(dispatch push 4)
'(dispatch push 8)
'(dispatch +)
'(dispatch push 3)
'(dispatch *)

Let’s com­pare that to the reader behav­ior we were look­ing for:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

Except for the quotes (which are just there tem­porar­ily for debug­ging), that’s exactly right. We can move on — after we’ve made one last change to pre­pare the reader to coop­er­ate with the expander (which we’ll be build­ing in the next sec­tion). To do this, we change the mod­ule expres­sion so that instead of invok­ing br as the expander, it invokes "stacker.rkt". So the fin­ished reader looks like this:

stacker.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#lang br
(define (read-syntax source-path input-port)
  (define src-strs (remove-blank-lines (port->lines input-port)))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (strip-context
   (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                  #'(module stacker-mod "stacker.rkt"
                      _SRC-EXPR ...))))
(provide read-syntax)

The expander

To recap — every lan­guage in Racket must pro­vide two func­tions:

  1. The reader con­verts the source file from a string of char­ac­ters into S-expres­sions.

  2. The expander deter­mines how the reader S-expres­sions cor­re­spond to real Racket expres­sions, which are then eval­u­ated to pro­duce a result.

We made our reader. Now we’ll make our expander.

First, why is it called an expander? Recall that our reader con­sisted of a read-syntax func­tion that took source code and, instead of return­ing a value, pro­duced a syn­tax object that added (dispatch ...) to each line of the source code, in essence “expand­ing” it.

It turns out that a lot of things in Racket that look like run-time func­tions are instead copy­ing & rear­rang­ing code at com­pile time. These are a spe­cial class of func­tions called macros. Macros have a restricted inter­face: they take one syn­tax object as input, and return another syn­tax object as out­put. Thus, macros are also known as syn­tax trans­form­ers.

(Though macros are func­tions, it’s con­ven­tional in Racket to only refer to them as “macros,” and reserve the term “func­tion” for some­thing that’s called at run time. So if I say some­thing like “con­vert a func­tion into a macro,” that’s the intended con­trast.)

Under this def­i­n­i­tion, our read-syntax func­tion wasn’t quite a macro. Though it did return a syn­tax object, it accepted two input argu­ments: a path and an input port.

But as we’ll see, our expander is going to start with a macro, which will in turn bring in the other func­tions we need to run our source file.

Design­ing our expander

As we did with the reader, let’s first pen­cil out what our expander needs to do.

The reader was respon­si­ble for the form of the code; we could say the expander is respon­si­ble for its mean­ing. The expander’s job is to eval­u­ate the code pre­pared by the reader. The code can be eval­u­ated only if every name used in the code has a con­nec­tion to an actual value or func­tion. In Racket, a “name used in the code” is known as an iden­ti­fier, and “a con­nec­tion to an actual value or func­tion” is known as a bind­ing. In short, the expander has to ensure that every iden­ti­fier in the code has a bind­ing. (A vari­able in Racket = an iden­ti­fier + a bind­ing.)

(By the way, when we used strip-context in read-syntax, what we were strip­ping were any lin­ger­ing bind­ings that might be attached to the iden­ti­fiers within our syn­tax object. We want the expander to have a clean slate.)

We already came across this ter­mi­nol­ogy when we ran "stacker-test.rkt" with­out an expander (run it again if you like). We got the error dispatch: unbound identifier in module. Now we see what it means. The code from the reader refers to a dispatch iden­ti­fier, e.g.:

1
(dispatch push 4)

This expres­sion means “call the func­tion dispatch with the argu­ments push and 4.” But the expander hasn’t yet given dispatch a bind­ing. Hence the error when the pro­gram runs: dispatch is unde­fined.

This error isn’t spe­cific to our lan­guage. You’ll trig­ger the same error in Racket when­ever you use a name that hasn’t yet been defined, for instance:

1
2
#lang racket
foo
1
2
#lang scribble/doc
@bar[]

In our case, the code we’re get­ting from the reader will look like this:

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

From this sam­ple, we can list the iden­ti­fers that will need bind­ings:

  1. dispatch, which is a func­tion.

  2. push, which is a stack com­mand.

  3. + and *, which are stack oper­a­tors.

We also need num­bers. But we get those for free — in Racket, num­bers can’t be iden­ti­fiers. They auto­mat­i­cally eval­u­ate to their numeric value.

Our expander design is com­plete. If our expander pro­vides bind­ings for these iden­ti­fiers — and one spe­cial macro, that we’ll dis­cuss in the next sec­tion — then the stacker lan­guage will work.

Pro­gram­ming our expander: out­put

We saw that Racket starts the reader by invok­ing a func­tion called read-syntax. Sim­i­larly, Racket starts the expander by invok­ing a macro called, by con­ven­tion, #%module-begin. There­fore, every expander needs to pro­vide a #%module-begin macro.

Racket has a small num­ber of macros with spe­cial sta­tus. You can rec­og­nize them by the #% pre­fix. #%module-begin is spe­cial because when a mod­ule is eval­u­ated, all the code inside that mod­ule gets sent to #%module-begin, which can do any­thing it pleases with it, before return­ing a syn­tax object, after which the pro­gram expan­sion & eval­u­a­tion will con­tinue.

Because every Racket lan­guage pro­vides its own #%module-begin, the most com­mon way to imple­ment the #%module-begin in a new lan­guage is:

  1. Han­dle any lan­guage-spe­cific pro­cess­ing of the code.

  2. Pass the result to another #%module-begin for the heavy lift­ing.

Of course, because all the #%module-begins have the same name, some care needs to be taken.

But let’s see the care­less approach first. Put this code in "stacker.rkt":

1
2
3
4
5
6

As before, we’ll use define and provide to set up our func­tion and make it avail­able to other files. We’ll talk about #'(#%module-begin _READER-LINE ...) in the next sec­tion. (As for #%top-interaction, that’s a lit­tle house­keep­ing func­tion that will make it eas­ier to run our pro­gram in DrRacket.)

For now, let’s focus on the out­put of the macro. As with read-syntax, it’s another syn­tax object (mean­ing, it begins with #'). This pre­lim­i­nary #%module-begin isn’t going to do any­thing with the inbound code. It’s just going to cre­ate a vari­able called studio and print its value. Then it will pass this syn­tax to the #%module-begin from the br lan­guage (which is always avail­able within "stacker.rkt" because the file is writ­ten in #lang br).

Make sure "stacker-test.rkt" still looks like this, and run it:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker.rkt"
push 4
push 8
+
push 3
*

What hap­pened? Noth­ing yet? OK, one Mis­sis­sippi, two Mis­sis­sippi ... how about now? Noth­ing at all? Why is this tak­ing so long?

Unfor­tu­nately the run will never fin­ish, because our expander has an awful bug in it. Within DrRacket, select Force the pro­gram to quit or type ctrl+K. (Con­sider mem­o­riz­ing this key com­bi­na­tion, because it’s not the last time you’ll need to kill a pro­gram in your lan­guage-debug­ging career.)

Astute observers prob­a­bly already noticed the bug: when we defined our macro with the name #%module-begin, the #%module-begin inside the macro, instead of refer­ring to the #%module-begin in #lang br, now referred to itself. (A prob­lem of vari­able nam­ing more gen­er­ally known as shad­ow­ing.) So our #%module-begin called itself recur­sively ... for­ever.

We can cure the prob­lem by giv­ing our macro an arbi­trary name when we define it, and then using rename-out to change its name as it passes through provide, like so:

1
2
3
4
5
6
(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     (define studio (* 6 9))
     (displayln studio)))
(provide (rename-out [stacker-module-begin #%module-begin]))
(provide #%top-interaction)

After you replace this code, run "stacker-test.rkt" again, and you’ll get the expected result:

1
54

“Hold on. Why does renam­ing the macro work? Doesn’t the out­put syn­tax still refer to #%module-begin, which is the final name of the macro?” No, but only because Racket macros are hygienic. Macros copy code to other places, pack­aged in syn­tax objects. That code con­tains iden­ti­fiers. So a ques­tion arises: how should we deter­mine the bind­ing — aka, the mean­ing — of a copied iden­ti­fier? Should we look at the des­ti­na­tion for the copy (where the macro is being invoked)? Or at the source for the copy (where the macro is defined)?

Hygienic macros fol­low the sec­ond rule — a macro copies code, but the iden­ti­fiers used in the macro keep their bind­ings from the place where the macro was defined. (Hence the “hygiene” metaphor — the macro keeps its iden­ti­fiers “cleanly” sep­a­rated from those at the des­ti­na­tion.)

To see hygiene in action, go into DrRacket and hover your cur­sor over the #%module-begin inside stacker-module-begin. Now that we’ve changed the name of the macro, you’ll see a mes­sage pop up: binding #%module-begin imported from br. What hygiene guar­an­tees is that when­ever this macro is used, the #%module-begin iden­ti­fier will always refer to the #%module-begin from #lang br.

For now, we won’t delve into the details of how Racket keeps all this straight. And yes, it is pos­si­ble to over­ride hygiene when needed. What’s impor­tant to notice is that a macro is not merely a find-and-replace mech­a­nism — it floats within a bub­ble of bind­ings.

More­over, hygiene is good pol­icy: it guar­an­tees that wher­ever our macro is used, the copied code will always mean the same thing. Here’s an exam­ple of why that might be a wise idea. Curi­ous char­ac­ters are invited to paste this pro­gram into DrRacket and hover over the two ref­er­ences to * to under­stand why the *-macro does not, in fact, start global ther­monu­clear war (unfor­tu­nately, (* 6 9) does):

1
2
3
4
5
6
7
8
9
#lang br
(module the-macro br
  (provide *-macro)
  (define #'(*-macro _X _Y) #'(* _X _Y)))

(require 'the-macro) ; `require` imports bindings from a module
(define (* x y) 'global-thermonuclear-war-initiated)
(*-macro 6 9)
(* 6 9)

We’ve now proved that we can make a round-trip from our source file, through our reader, then through our expander, and back. We’re another step closer to hav­ing a work­ing lan­guage.

Pro­gram­ming our expander: input

Recall that a macro takes one syn­tax object as input and returns another. Con­sis­tent with that logic, we can use the #' syn­tax pre­fix within define to eas­ily con­vert a func­tion into a macro. Just add the #' pre­fix to both the input and out­put expres­sions, like so.

1
2
3
4
#lang br
(define (func x y z) (list x y z))
(define #'(mac _X _Y _Z) #'(list _X _Y _Z))
(list (func 4 5 6) (mac 4 5 6)) ; '((4 5 6) (4 5 6))

One wrin­kle is that the input expres­sion for a macro is actu­ally a syn­tax pat­tern, like we saw in our read-syntax func­tion:

1
2
3
(inject-syntax ([#'(__SRC-EXPR ...) src-exprs])
  #'(module stacker-mod "stacker.rkt"
      __SRC-EXPR ...))

That means we can use the ... syn­tax ellip­sis when needed:

1
2
3
4
#lang br
(define (func x y z) (list x y z))
(define #'(mac _ARG ...) #'(list _ARG ...))
(list (func 4 5 6) (mac 4 5 6)) ; '((4 5 6) (4 5 6))

It also means that we’ll use the all-caps con­ven­tion when nam­ing the argu­ments to a macro. (Again, we don’t have to, but it’s a visual cue that we’re work­ing with syn­tax-pat­tern vari­ables rather than reg­u­lar vari­ables).

There’s often more than one valid way to write a macro. For instance, these three macros pro­duce the same result (shouldn’t be hard to infer why):

1
2
3
4
5
6
#lang br
(define (func x y z) (list x y z))
(define #'(mac _X _Y _Z) #'(list _X _Y _Z))
(define #'(mac-2 _X _Y _Z) #'(func _X _Y _Z))
(define #'mac-3 #'func)
(list (func 4 5 6) (mac 4 5 6) (mac-2 4 5 6) (mac-3 4 5 6))

For our #%module-begin macro, we’ll invoke define with the syn­tax pat­tern #'(#%module-begin _READER-LINE ...). In so doing, our macro will match the first line of the incom­ing code to _READER-LINE, and the remain­ing lines to the ... short­hand. For now, we’ll just pass the _READER-LINEs through to the next #%module-begin in the chain, leav­ing us with the sim­plest pos­si­ble macro:

1
2
3
4
(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     _READER-LINE ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

As before, we’re using caps for _READER-LINE just to remind our­selves that it’s a syn­tax-pat­tern vari­able, and thus will be replaced with a value from the pat­tern (unlike #%module-begin, which remains #%module-begin).

Test­ing our expander I/O

Just as we did in our reader, we can add the quote pre­fix ' inside our macro to print out the lit­eral code:

1
2
3
4
(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     '_READER-LINE ...))
(provide (rename-out [stacker-module-begin #%module-begin]))

When we run our "stacker-test.rkt":

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker.rkt"
push 4
push 8
+
push 3
*

We should get the same result as we got when test­ing our reader:

1
2
3
4
5
'(dispatch push 4)
'(dispatch push 8)
'(dispatch +)
'(dispatch push 3)
'(dispatch *)

Before we move on, let’s make sure our "stacker.rkt" is restored to its nor­mal state. We’re up to a whop­ping 15 lines of code:

stacker.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#lang br
(define (read-syntax source-path input-port)
  (define src-strs (remove-blank-lines (port->lines input-port)))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (strip-context
   (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                  #'(module stacker-mod "stacker.rkt"
                      _SRC-EXPR ...))))
(provide read-syntax)

(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     _READER-LINE ...))
(provide (rename-out [stacker-module-begin #%module-begin]))
(provide #%top-interaction)

Run "stacker-test.rkt" again and we should see a dif­fer­ent error: dispatch: unbound identifier. We’ll fix that next.

Pro­gram­ming our expander: bind­ings

To com­plete our expander, we need to define bind­ings for the iden­ti­fiers we enu­mer­ated in our expander design:

  1. dispatch, which is a func­tion.

  2. push, which is a stack com­mand.

  3. + and *, which are stack oper­a­tors.

We also need to provide these bind­ings so they’re vis­i­ble out­side our expander. The expander, in essence, deter­mines the uni­verse of bind­ings that are avail­able in our lan­guage. So we can only use a func­tion in our lan­guage if the expander explic­itly pro­vides it (con­versely, if it doesn’t, we can’t).

We’ll start with push, which takes one numer­i­cal argu­ment and pushes onto the top of a stack. Add the fol­low­ing code to "stacker.rkt":

1
2
3
4
(define stack empty)

(define (push num) (set! stack (cons num stack)))
(provide push)

Our stack starts out as the empty list. push uses the cons func­tion to put num onto the front of the list, and set!s the stack to the new ver­sion. cons is one of the old­est Lisp func­tions, get­ting its name from its role con­struct­ing a new item in mem­ory. Every time we cons an item onto the list, we cre­ate a new list that’s one big­ger. set! is self-explana­tory, except maybe the ! suf­fix — that’s a Racket con­ven­tion to denote func­tions that update the value of a vari­able.

Of course, we also (provide push). Notice, how­ever, that we don’t also pro­vide stack — that can remain pri­vate to the expander.

Now let’s add our long-awaited dispatch func­tion:

1
2
3
4
5
6
7
8
(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else
     (define op arg-1)
     (define op-result (op (first stack) (second stack)))
     (set! stack (cons op-result (drop stack 2)))]))
(provide dispatch)

There are a few new Racket func­tions here, so we’ll walk through each line.

1
(define (dispatch arg-1 [arg-2 #f])

We define our dispatch func­tion to take two argu­ments. How do we know we need two argu­ments? We just look at the code that’s com­ing into our expander.

1
2
3
4
5
(dispatch push 4)
(dispatch push 8)
(dispatch +)
(dispatch push 3)
(dispatch *)

We see there are two call­ing pat­terns for dispatch. Either we’ll get two argu­ments, where the first argu­ment is push and the sec­ond argu­ment is a num­ber. Or we’ll get one argu­ment, which will be a stack oper­a­tor (+ or *). The sec­ond argu­ment is optional, so we put it in square brack­ets and assign a default value of #f.

1
2
3
4
(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else ...]))

cond is Racket’s con­di­tional expres­sion. Each set of square brack­ets intro­duces a con­di­tional test, fol­lowed by the code to eval­u­ate if the test is true. An else branch is invoked when none of the tests are true.

Our first branch tests whether we got a number? for arg-2. If so, we know it’s a push instruc­tion, so we (push arg-2).

1
2
3
4
[else
 (define op arg-1)
 (define op-result (op (first stack) (second stack)))
 (set! stack (cons op-result (drop stack 2)))]

If we don’t have a push instruc­tion, we must have a stack oper­a­tor in the first argu­ment. So we (define op arg-1). We need to send our oper­a­tor the first and second ele­ments from the stack, so we get (op (first stack) (second stack)). Then we put the result­ing value back onto the stack using set!, as we did in push. Notice, how­ever, that we drop those two ele­ments from the stack before we put the result onto the stack — the idea is that they get con­sumed by the oper­a­tor.

In the penul­ti­mate line of the expander, we’ll take care of pro­vid­ing + and *:

1

We don’t need to define these func­tions as we did with push and dispatch. They’re already defined by #lang br, and we can just bor­row them. Of course, that’s because we’re happy with their exist­ing mean­ing. If we wanted, we could cre­ate new bind­ings for + and * for our lan­guage, even coun­terini­tu­itive ones, that would super­sede their #lang br mean­ings:

1
2

But we won’t.

Finally, one more house­keep­ing line — to get func­tions and num­bers to work in our pro­gram, we need to provide the spe­cial Racket macros #%app and #%datum. As with + and *, these are already avail­able from #lang br, so we can just bor­row them:

1

Our "stacker.rkt" should now look like this:

stacker.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#lang br
(define (read-syntax source-path input-port)
  (define src-strs (remove-blank-lines (port->lines input-port)))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (strip-context
   (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                  #'(module stacker-mod "stacker.rkt"
                      _SRC-EXPR ...))))
(provide read-syntax)

(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     _READER-LINE ...))
(provide (rename-out [stacker-module-begin #%module-begin]))
(provide #%top-interaction)

(define stack empty)
(define (push num) (set! stack (cons num stack)))
(provide push)

(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else
     (define op arg-1)
     (define op-result (op (first stack) (second stack)))
     (set! stack (cons op-result (drop stack 2)))]))
(provide dispatch)

(provide + *)
(provide #%app #%datum)

Test­ing our expander with bind­ings

Are we there yet? Just about. Let’s run our "stacker-test.rkt" file, which still looks like this:

stacker-test.rkt
1
2
3
4
5
6
#lang reader "stacker.rkt"
push 4
push 8
+
push 3
*

Run "stacker-test.rkt". The good news — no errors. The bad news — no pro­gram result, either.

For­tu­nately this is also a sim­ple fix. The result of our pro­gram is what­ever value is sit­ting on top of the stack at the end. So we add one line to our stacker-module-begin, to display the first ele­ment of the stack after all the _READER-LINEs have been eval­u­ated:

1
2
3
4
(define #'(stacker-module-begin _READER-LINE ...)
#'(#%module-begin
   _READER-LINE ...
   (display (first stack))))

Run "stacker-test.rkt" one more time, and you’ll get:

1
36

Con­grat­u­la­tions — you just made a pro­gram­ming lan­guage.

Next steps

Source list­ing

stacker.rkt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#lang br
(define (read-syntax source-path input-port)
  (define src-strs (remove-blank-lines (port->lines input-port)))
  (define (make-datum str) (format-datum '(dispatch ~a) str))
  (define src-exprs (map make-datum src-strs))
  (strip-context
   (inject-syntax ([#'(_SRC-EXPR ...) src-exprs])
                  #'(module stacker-mod "stacker.rkt"
                      _SRC-EXPR ...))))
(provide read-syntax)

(define #'(stacker-module-begin _READER-LINE ...)
  #'(#%module-begin
     _READER-LINE ...
     (display (first stack))))
(provide (rename-out [stacker-module-begin #%module-begin]))
(provide #%top-interaction)

(define stack empty)
(define (push num) (set! stack (cons num stack)))
(provide push)

(define (dispatch arg-1 [arg-2 #f])
  (cond
    [(number? arg-2) (push arg-2)]
    [else
     (define op arg-1)
     (define op-result (op (first stack) (second stack)))
     (set! stack (cons op-result (drop stack 2)))]))
(provide dispatch)

(provide + *)
(provide #%app #%datum)