To recap—every language in Racket has two essential components:
A reader, which converts source code from a string of characters into parenthesized S-expressions.
An expander, which determines how these S-expressions correspond to real Racket expressions, which are then evaluated to produce a result.
We’re done with our reader. Now we’ll make our expander.
First, why is it called an expander? Recall that our reader consisted of a read-syntax function that surrounded each line of the source file with (handle ...), in essence “expanding” it. The expander will perform a similar process on the rest of the code, with the help of macros.
It turns out that a lot of things in Racket that look like normal functions are actually copying & rewriting code when the program is compiled (an event we’ll call compile time) rather than being invoked when the program is evaluated (an event we’ll call run time). Suppose we use and within a program like so:
1 | (and cond-a cond-b cond-c) |
From afar, and looks like a function. But it’s not. Rather, at compile time, this use of and gets expanded—that is, rewritten—so it looks like this:
This is the code that actually gets evaluated at run time. Let’s not be alarmed. These nested if conditionals mean the same thing as our original and. The expander has simply rewritten the and expression as the equivalent nested if expressions.
Within Racket, and belongs to a special and important class of functions called macros. Macros have a restricted interface: they take certain code as input (packaged as a syntax object), and return other code as output (also as a syntax object). Thus, macros are sometimes known more precisely as syntax transformers. We can think of them as a sophisticated form of search-and-replace—they’re like regular expressions designed to work on code fragments, rather than strings.
Strictly speaking, macros are functions, but it’s conventional in Racket to call them macros, and reserve the term function for a procedure that’s evaluated at run time. So if we talk about “converting a function into a macro”, that’s the intended contrast.
Macros are also known, especially within the Racket documentation, as syntactic forms or just forms—a nice term, because it emphasizes that a macro doesn’t evaluate the code it gets as input. Rather, it implements a template for putting items into certain positions, like filling in the blanks of a form. We could, for instance, use the and form like this:
If and were a function, these arguments would be evaluated first, and the program would end with error: boom before the arguments were ever passed to and.
But because and is a macro, it happily rewrites our code as usual:
When this new code runs, we still get an error, of course. But the and macro was able to complete its work anyhow. This brings us to the three golden rules of macros:
At compile time, a macro takes one code fragment as input, and converts it to a new code fragment. The input & output code fragments are each packaged inside a syntax object.
Because compile time happens before run time, all macros operate before any run-time functions.
A macro can only treat its input code as a literal syntactic entity. It can’t evaluate arguments or expressions within that code, because those values are only available at run time (which happens after compile time).
Under this definition, our read-syntax function doesn’t qualify as a macro. It didn’t take a syntax object as input—rather, it took two input arguments (a path and an input port) and made a syntax object from that.
Because macros run at compile time and have a restricted interface, they’re not as versatile as ordinary functions. For this reason, a macro is not the tool of first resort. Still, macros are useful & important because they can do things that ordinary functions cannot.
As we’ll see, the linchpin of our expander will be a macro.
Racket’s macro system is its crown jewel. Refined for nearly 20 years, it’s a work of great science and beauty. It’s also fundamental to how Racket models the implementation of languages. Therefore, anyone who wants to make languages with Racket has to be willing to learn a few things about macros.
That said, though the macro system is always rewarding, it’s not always easy. For most programmers, these are new & unfamiliar concepts. Don’t panic. These ideas aren’t esoteric or complicated. They just introduce a new way of thinking about program code. Most of us haven’t thought about code this way because we haven’t worked with languages with an elegant macro system like Racket’s.
If the reader was responsible for the form of the code, we could say the expander is responsible for its meaning. The expander’s job is to prepare the code so Racket can evaluate it.
Prepare it how? The code can be evaluated only if every name used in the code has a connection to an actual value or function. In Racket, a “name used in the code” is called an identifier. “A connection to an actual value or function” is called a binding. So we’d say that the expander prepares the code for evaluation by ensuring that every identifier has a binding. Once an identifier has a binding, it becomes a variable.
We already came across the ideas of identifiers and bindings when we tested our stacker reader without an expander. Recall that we got this error:
1 | handle: unbound identifier ... |
Now we know what it means. The code from the reader referred to a handle identifier, e.g.:
1 | (handle 4) |
This expression means “call handle with the argument 4.” But the expander hadn’t given handle a binding. Thus we got the “unbound identifier” error.
Within the expander, we have three basic techniques for adding bindings to code:
We can define macros that rewrite certain code as other code at compile time (for instance, and).
We can define functions that are invoked at run time.
We can import bindings from existing Racket modules, which can include both macros and functions.
We’ll use all three techniques in our stacker expander.
As we did with our reader, let’s pencil out what our expander needs to do.
In our case, the code we’re getting from the reader will look like this:
From this sample, we can set out the tasks for our expander:
From the prototype code above, we see we need to provide bindings for three identifiers: handle, which determines what to do with each argument; +, a stack operator; and *, another stack operator.
We also need numbers. But we get those for free—in Racket, numbers can’t be identifiers. They automatically evaluate to their numeric value. Similarly, parentheses are always treated as delimiters, and Racket already knows what to do with them.
Consistent with the design of stacker, we also have to implement a stack. The stack needs an interface for storing, reading, and doing operations on arguments. This interface will be used by handle.
Finally, our expander needs to provide a special macro called #%module-begin to get everything started. We’ll discuss that in the next section.
Once we have these elements in place, the stacker language will work.
Let’s think about the state of our code when the expander starts. The reader has just finished its work—meaning read-syntax has returned code describing a module expression, packaged inside a syntax object. For example, if the stacker reader started with source code like this:
1 2 3 4 5 | 4 8 + 3 * |
It would return the following module as a syntax object:
This becomes the starting input for the expander.
We saw that Racket starts the reader for a language by invoking a function called, by convention, read-syntax. Similarly, Racket starts the expander for a language by invoking a macro called, by convention, #%module-begin. Every expander must export a #%module-begin macro. + #%module-begin belongs to a special class of macros called interposition points.
Racket converts the module expression above into an invocation of the #%module-begin macro by passing it the code inside the module:
1 2 3 4 5 6 7 | (#%module-begin (handle) (handle 4) (handle 8) (handle +) (handle 3) (handle *)) |
Like any macro, #%module-begin will take this code as its input. And like any macro, #%module-begin will rewrite it as necessary and return the new code packaged as a syntax object. This new code will replace the whole #%module-begin expression above, and expansion & evaluation will continue from there.
Because every Racket language exports a #%module-begin, the most common way to implement the #%module-begin in a new language is as follows:
Handle any language-specific processing of the code.
Pass the result to an existing #%module-begin for the rest of the heavy lifting. (Caution: all the #%module-begin macros have the same name. Some care is needed to keep them straight.)
Let’s open "stacker.rkt" and update it like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) |
At the top, we see good old read-syntax, just as we left it.
Below that, our definition of #%module-begin. Let’s step through each part.
1 | (define-macro (stacker-module-begin HANDLE-EXPR ...) |
When we made our reader, we saw that an ordinary function is defined with a list of input arguments. But because a macro gets a chunk of code, we typically define a macro with a syntax pattern instead. A syntax pattern is like a regular expression: it breaks down the input into pieces so they can be manipulated & rearranged.
Our ordinary define doesn’t support syntax patterns in the first line. Instead, we use define-macro, which does. This first line says we’re defining a macro called stacker-module-begin. HANDLE-EXPR is a pattern variable—meaning, a named match within the syntax pattern. When used with an ellipsis ..., this pattern variable will match each line of the code passed to the macro. (We’ll defer the details till later, though the curious can detour through syntax patterns.)
Then we have the return value of our macro:
1 2 | #'(#%module-begin HANDLE-EXPR ...) |
Syntax objects in Racket are so common that we have a few ways to make them. In our read-syntax we’re using datum->syntax to make a syntax object from module-datum.
This time, we’re using some new notation—the prefix #'—to make code into a syntax object. It’s similar to the ' prefix we’ve already used that makes code into a datum. But the #' prefix not only creates the datum, but also captures the current lexical context, and attaches that to the new syntax object. Lexical context is the fancy phrase for “a list of available variables”. In practice, what it means is that a syntax object made with #' will be able to access all the variables defined at that point in the code.
One of these is the HANDLE-EXPR ... pattern variable. Those lines of code will just be merged into the new syntax object.
Another variable is #%module-begin. Recall that our plan is to take our input code and pass it to the next #%module-begin in the chain. When we go into DrRacket and put the cursor over the #%module-begin, we see a purple arrow and a popup message that says imported from "br/quicklang":
We don’t need to explicitly import this #%module-begin. When we use any language, including br/quicklang, we get access to its bindings. And br/quicklang, like every language, provides its own #%module-begin. + We could also import and use the #%module-begin from another language, but in this case, there’s no need.
Finally we have to make stacker-module-begin available outside stacker.rkt:
1 | (provide (rename-out [stacker-module-begin #%module-begin])) |
As before, we use provide. But this time, we use the rename-out qualifier so that stacker-module-begin is available outside this source file with the correct #%module-begin name.
Why didn’t we just name our macro #%module-begin at the start? Suppose we had written our macro like this:
1 2 3 4 | (define-macro (#%module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide #%module-begin) |
The problem here is that we’d be creating a naming conflict between two #%module-begin macros: the new one we’re defining, and the one from br/quicklang that we’re hoping to rely on. As it stands, this newly defined #%module-begin will just call itself, creating an infinite loop. Thus, we give our new #%module-begin macro a temporary, non-conflicting name, and change it as part of the provide expression.
Let’s see if our simple #%module-begin works. Here’s what "stacker.rkt" should look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) |
Now run "stacker-test.rkt" (which is the same as ever):
We get an error. It begins:
1 | handle: unbound identifier ... |
That’s the same error we got when we tried to run our program without an expander. But we haven’t gotten around to defining handle yet, so we shouldn’t be surprised that the error persists.
Can we at least check that our macro is working? Sure. Let’s reuse the shortcut we used to test the reader the first time: adding a ' prefix to the output so it gets converted back into literal code. This time, we add the ' prefix to the HANDLE-EXPR inside our macro definition like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin 'HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) |
Now when we run "stacker-test.rkt", we get:
1 2 3 4 5 6 | '(handle) '(handle 4) '(handle 8) '(handle +) '(handle 3) '(handle *) |
What we’re seeing is our source code, converted into (handle ···) expressions by the read-syntax function in our reader, then converted into datums by the #%module-begin macro in our expander, and then printed in the output window of DrRacket.
That’s good news—we’ve verified that we can make a round-trip from our source file, through our reader, through our expander, and back. We’re another step closer to having a working language.
Let’s remove the ' prefix we just added to the front of HANDLE-EXPR, and move on.
When we designed our expander, we set out three tasks:
Provide the special #%module-begin macro.
Implement a stack, with an interface for storing, reading, and doing operations on arguments, that can be used by handle.
Provide bindings for three identifiers: handle, which determines what to do with each argument; +, a stack operator; and *, another stack operator.
We completed the first task—we made #%module-begin. So now we need to work on the other two—implementing a stack and creating bindings for our identifiers.
First the stack, which we implement using Racket list operations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) (define stack empty) (define (pop-stack!) (define arg (first stack)) (set! stack (rest stack)) arg) (define (push-stack! arg) (set! stack (cons arg stack))) |
Our stack starts out as the empty list.
pop-stack! removes the top argument from the stack and returns it. We do this in three steps. First, we get the first argument from the stack and assign it to a temporary arg variable. Second, we set! our stack equal to the rest of the arguments after the first. + By convention, Racket functions that update the value of a variable are named with a ! at the end. For other naming conventions, see functions. Third, we return arg as the result.
push-stack! puts a new argument on the stack. We do this with cons. cons is one of the oldest Lisp functions, getting its name from its role constructing a new item in memory. (For more about cons, see pairs.) Every time we cons an item onto the list, we create a new list that’s one bigger. After we push the argument, we set! our stack equal to this new, larger stack.
Now let’s add our long-awaited handle function. Let’s remember how handle is invoked in a stacker program:
We observe that it can have zero or one argument. If it has an argument, it’s either a number or a stack operator (either + or *). So let’s add the following handle function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) (define stack empty) (define (pop-stack!) (define arg (first stack)) (set! stack (rest stack)) arg) (define (push-stack! arg) (set! stack (cons arg stack))) (define (handle [arg #f]) (cond [(number? arg) (push-stack! arg)] [(or (equal? + arg) (equal? * arg)) (define op-result (arg (pop-stack!) (pop-stack!))) (push-stack! op-result)])) (provide handle) |
Then we step through each line:
1 | (define (handle [arg #f]) |
We define our handle function to take one argument, called arg. By declaring arg in brackets with a default value of #f, we make it an optional argument. So if handle is called with an argument, arg takes that value; if not, arg is set to #f.
This cond introduces a conditional expression with two branches. The condition on the left side of a branch is tested. If the condition is true, the expressions on the right side of the branch are evaluated. If the condition fails, we move to the next branch. + Further details in Booleans and conditionals. The first branch tests if arg is a number?. If so, we put it onto the stack with push-stack!.
In the second branch, we test if arg is equal? to one of our stack operators. If so, we retrieve the first two elements from the stack with two calls to pop-stack!. We apply arg to these values by putting it in the function position of a new expression with the stack values as arguments. We store the result in op-result. We then push op-result onto the stack.
What if handle is called with no arguments, and arg is #f? We were always planning to ignore that case. So we don’t need to add a branch to our cond expression to deal with it. We just let it fall through. Likewise, any other input that doesn’t meet one of our conditions will be ignored.
1 | (provide handle) |
Finally, we make handle available outside our source file with the usual provide.
We also need to provide bindings for + and *. This is easy, because our br/quicklang language already defines these functions. We just need to borrow these bindings for stacker. We can do that by adding one more provide to our source file:
Our "stacker.rkt" should now look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ...)) (provide (rename-out [stacker-module-begin #%module-begin])) (define stack empty) (define (pop-stack!) (define arg (first stack)) (set! stack (rest stack)) arg) (define (push-stack! arg) (set! stack (cons arg stack))) (define (handle [arg #f]) (cond [(number? arg) (push-stack! arg)] [(or (equal? * arg) (equal? + arg)) (define op-result (arg (pop-stack!) (pop-stack!))) (push-stack! op-result)])) (provide handle) (provide + *) |
We’re getting close. Let’s run our "stacker-test.rkt" file, which still looks like this:
The good news—no errors.
The bad news—no program result, either.
This is a simple fix. The result of our program is whatever value is sitting on top of the stack at the end. So we add one line to our stacker-module-begin, to display the first element of our stack after all the HANDLE-EXPR lines have been evaluated:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) (define-macro (stacker-module-begin HANDLE-EXPR ...) #'(#%module-begin HANDLE-EXPR ... (display (first stack)))) (provide (rename-out [stacker-module-begin #%module-begin])) (define stack empty) (define (pop-stack!) (define arg (first stack)) (set! stack (rest stack)) arg) (define (push-stack! arg) (set! stack (cons arg stack))) (define (handle [arg #f]) (cond [(number? arg) (push-stack! arg)] [(or (equal? * arg) (equal? + arg)) (define op-result (arg (pop-stack!) (pop-stack!))) (push-stack! op-result)])) (provide handle) (provide + *) |
If we run "stacker-test.rkt" one more time, we get:
1 | 36
|
That’s it—we’ve made our first programming language.