Recall the purpose of the reader:
The reader converts the source code of our language from a string of characters into Racket-style parenthesized forms, also known as S-expressions.
Let’s visualize what our reader needs to accomplish. At the end of the tutorial, our stacker-test.rkt program will look like this:
The first line—#lang reader "stacker.rkt"—is notation that means “use the "stacker.rkt" module as the reader for this language.” But for now, let’s focus on the program instructions below the #lang line:
Clearly, these are not parenthesized forms. How can we get there? We can see that stacker has one instruction per line. So the simplest idea—never a bad place to start—would be for our reader to wrap parentheses around each line:
The good news—these now look more like the parenthesized forms we need. The bad news—they’re not useful. Recall that a parenthesized form in Racket is treated as a function call. But numbers aren’t functions. So expressions like (4) and (8) will produce errors.
Let’s try it in DrRacket. Open a new window with the default #lang br language at the top and try evaluating (4):
1 2 | #lang br (4) |
We’ll get an error telling us that 4 is not a “procedure”—another word for a function—“that can be applied to arguments”:
1 2 3 4 | application: not a procedure; expected a procedure that can be applied to arguments given: 4 arguments...: [none] |
Our other two parenthesized forms—(+) and (*)—are valid function calls. Again, we can verify this by evaluating them in DrRacket:
But as it stands, they’re being evaluated without any input arguments. + Math fans can figure out why it makes sense that (+) evaluates to 0 and (*) to 1. That’s wrong. They’re supposed to take their input from the top of the stack.
Our first idea—wrapping our arguments with parentheses—won’t work. We need to handle the interaction of our stack and the arguments in the program more carefully.
Instead, suppose we have an intermediary function—call it handle—that inspects each instruction and decides what to do. Further suppose that we wrap each program argument like so:
What have we accomplished? Our stacker arguments will no longer be treated as function calls. Rather, they’re now input to another function:
(handle 4) means “call handle with 4 as the argument.” In that case, handle can push the number onto the stack.
(handle +) means that handle should get the top two arguments from the stack and apply the operation.
Now that we know what our reader needs to do, we can program it.
“Wait—where does this magical handle function come from?” Remember the other indispensable part of our language:
The expander, which determines how these parenthesized forms correspond to real Racket expressions (and which are then evaluated to produce a result).
With the reader, we’re putting our program into the proper form. With the expander, we’ll give meaning to these forms. So our handle function will eventually be available through our expander.
If it seems sneaky to make plans around a nonexistent function, don’t panic. The reader and expander always exist as a cooperating pair. But they have distinct roles. So we want to put things in the right place. The handle function itself belongs in the expander, not the reader.
Having glimpsed the future, we return to the code.
As the first step in running a source file with a #lang line, Racket passes the source code to the reader for the language. It does this by invoking a function called, by convention, read-syntax. Every reader must export a read-syntax function. Racket passes two arguments to read-syntax: the path to the source file, and a port for reading data from the file. We’ll come back to these.
For now, let’s consider the output. Every read-syntax has one job: to return code describing a module, packaged as a syntax object. Racket will replace the original source code with this module. This module, in turn, will invoke the expander for our language, triggering the full expansion of the module. After that, the module will be evaluated normally by the Racket interpreter. But all this happens under the hood—from the outside, all we see is that our original source code has been converted into a result.
Let’s step through these new concepts by working through a sample read-syntax:
1 2 3 4 5 6 7 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (datum->syntax #f '(module lucy br 42))) (provide read-syntax) |
To create a function, we use define:
1 | (define (read-syntax path port) ··· |
This line sets up a function called read-syntax that accepts two arguments. These are positional arguments, so we can name them whatever we like. We’ll be boring and call them path and port.
We also need to export read-syntax so it’s available outside this file. In Racket, all definitions are private by default. To make a definition public, we use provide:
The body of read-syntax performs two tasks.
First, it reads the source code from port. A port is a generic interface for input (or output) that can be read (or written) incrementally—that is, piece by piece. For instance, if we stop reading an input port, the port remembers where we stopped. The next time we read from the port, we can resume from the same place. Input ports are useful when we’re wary of the size of the file on the other side. It might be too humongous to fit into memory.
In stacker, that won’t be a problem. So let’s just read everything from our port at once:
1 | (define src-lines (port->lines port)) |
We convert the contents of port to a list of strings by passing it to port->lines. We store the result in src-lines. (In this toy example, we won’t actually use src-lines, but we will later on, in the finished version of this function.)
The second task of read-syntax is to return code describing a module. In Racket, a module is the basic organizational unit for program code. Like everything else in Racket, a module is an expression. The code for a module expression follows this pattern:
So our sample module code:
1 2 | (module lucy br 42) |
Means “a module named lucy, using the expander from the br language, that evaluates the expression 42.”
The module name is arbitrary. It doesn’t affect how the module works.
The expander, however, is highly consequential. It determines how the expressions inside the module are interpreted. Here, we can use the br shorthand because that language is already installed. But we can also use an explicit path string to designate a source file with an expander, for instance:
1 2 | (module lucy "path/to/expander.rkt" 42) |
We still need to convert our module code into a syntax object, which is a way of treating code as data. We can use a syntax object to store a chunk of source code so it can be evaluated later. Here’s how we do that:
First, we turn the expression into a datum, which is the raw representation of the code as it appears in the source. A datum is not unlike stashing code inside a string, but one step evolved, because it preserves the list structure of the expression. (Strings, by contrast, are inherently flat.) To make a datum, we use quote:
1 2 | (quote (module lucy br 42)) |
But in Racket, datums are so common that there’s a shorthand notation for quote—we just add a ' prefix to the expression:
1 2 | '(module lucy br 42) |
With the ' prefix, this is no longer code for a module, but rather a list expression with three symbols and a number, as we can verify on the REPL:
Then we turn our datum into a syntax object. Under the hood, a syntax object is just a datum with some extra information attached, including its context within a program. We can upgrade a datum to a syntax object using datum->syntax. The first argument of datum->syntax is the program context we want to associate with the code. We don’t need to do that yet, so we pass #f for the context. The second argument is our datum:
1 2 | (datum->syntax #f '(module lucy br 42)) |
Finally, we need to return this syntax object as the result of the function. Racket has no explicit return statement. The return value of a function is just the last expression that appears in its body. So as long as our syntax object is the last expression in the body of read-syntax, we’re good.
Now we understand everything that’s happening in "stacker.rkt". We’ve defined a read-syntax function that, when invoked, will return a syntax object describing a module, which can be evaluated later.
1 2 3 4 5 6 7 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (datum->syntax #f '(module lucy br 42))) (provide read-syntax) |
“Evaluated later—like when?” When we invoke the stacker language. Let’s see how this works. We return to "stacker-test.rkt". This time, let’s insert some dummy arguments:
1 2 3 4 | #lang reader "stacker.rkt" foo bar zam |
When we first tried to invoke the "stacker.rkt" language from within "stacker-test.rkt", we got an error, because we hadn’t yet made a reader.
Now we have. So let’s save the file and run it again. This time, we’ll get a different result:
1 | 42 |
What’s happening this time? The #lang reader line is responsible for invoking the reader for stacker. To do this, Racket calls our read-syntax function. Our function returns a syntax object that describes a module.
Here’s what happens next: the source code in "stacker-test.rkt" is entirely replaced with the syntax object returned by read-syntax. So under the hood, this file:
1 2 3 4 | #lang reader "stacker.rkt" foo bar zam |
Becomes this:
1 2 | (module lucy br 42) |
Once our syntax object gets moved into its new location as real code, it looks just like our module-expression datum, except that it’s no longer quoted. This makes sense—when we made the module datum, we wanted to treat the code as data, so we prefixed it with '. Now we want to reverse the alchemy, and treat the data as code. So it gets unquoted. After that, the module expression is evaluated, producing 42.
Let’s persuade ourselves that nothing spooky has happened. Our read-syntax is responsible for reading the lines of our source file (for now, ignoring them) and returning a syntax object. The new magic we’re witnessing is that after Racket has passed the source code to read-syntax as an input port, it replaces that source code with the result from read-syntax.
Let’s ensure that we’re unimpressed by this magic. We can accomplish the same transformation by hand. We replace the #lang line and source code in "stacker-test.rkt" with our module code (this time without the ' prefix, because we want the code to be evaluated), and run it:
1 2 3 | ;; no #lang line this time (module lucy br 42) |
Once again we get 42. In general, every source file that begins with a #lang line gets converted to a module by the language reader. This is why source files in Racket are also known as modules. (For more, see the #lang line.)
This proved that we can make a round trip from a source file to the stacker reader and back. The major problem with our reader is that it’s not doing anything with the src-lines we read from the source file. So let’s fix that.
Right now, we’re ignoring the lines of our input source file. So we’ll upgrade our reader to complete two new tasks:
Wrap each line in a (handle ···) form.
Insert these new forms into the module we’re returning as a syntax object.
To do this, let’s swap out the read-syntax in "stacker.rkt" for a new version:
1 2 3 4 5 6 7 8 9 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod br ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) |
As we’ll see in the next section, this code will raise an error when we try to use it as a language. But let’s step through each line to see what it does, and then we’ll be able to understand the error.
1 | (define src-lines (port->lines port)) |
As we did before, we use port->lines to retrieve the lines from our input port—each representing an argument in a stacker program—as a list of strings. We store this list in src-lines. (In this project, we don’t need path, so we’ll ignore it.)
1 | (define src-datums (format-datums '(handle ~a) src-lines)) |
Next, we convert these strings into datums. + Yes, the plural of datum ought to be data. But this poetic license will help us avoid confusion with the traditional generic meaning of “data”. format-datums takes a list of strings and converts each of them using a format string. The ~a marks the place where the argument string will be substituted. So the string "4" from the source file will become the datum '(handle 4), and "+" will become '(handle +).
1 2 | (define module-datum `(module stacker-mod br ,@src-datums)) |
Then we make our module datum. Here we introduce a new bit of notation: the backtick ` is called quasiquote. As the name implies, it works mostly the same way as the usual quote prefix '. The “quasi” part is that we can insert variables into the list. To insert a single value, we use the unquote operator, which is a comma , followed by the variable:
1 2 | (define x 42) `(41 ,x 43) ; '(41 42 43) |
To insert a list of multiple values, we use the unquote-splicing operator, which is a comma and at sign ,@ followed by the variable. The unquote-splicing operator merges our sublist with the surrounding list:
Notice that if we use the unquote operator with a sublist, the sublist will remain nested:
Now we can understand what’s happening in our module datum:
1 2 | (define module-datum `(module stacker-mod br ,@src-datums)) |
This datum describes a module called stacker-mod (another arbitrary name). Because we haven’t written our stacker expander yet, we’ll temporarily use the br expander (meaning, this module will be interpreted as source written in the br language).
We quasiquote our module datum using ` so we can use ,@ inside to splice our src-datums into the body of the module. So if our src-datums were a list of three datums like this:
1 | '((handle 42) (handle +) (handle 25)) |
After splicing, our module datum will look like this:
1 2 3 4 | '(module stacker-mod br (handle 42) (handle +) (handle 25)) |
We move on to the last line of our function:
1 | (datum->syntax #f module-datum) |
As before, we finish by converting our datum to a syntax object using datum->syntax, passing #f as the context argument.
Finally, we (provide read-syntax) so the function is available outside this source file.
Let’s make sure our updated reader works the way we hope, by testing it with some dummy values. Save "stacker.rkt". Update "stacker-test.rkt" as follows:
1 2 3 4 | #lang reader "stacker.rkt" 42 "Hello world" #t |
Don’t worry that this code isn’t a valid stacker program. Our reader doesn’t know anything about the meaning of stacker code, so it should work with any arguments. We might as well try it with a number, a string, and a boolean.
Run this file in DrRacket. What happens? You should get an error that starts like this:
1 | handle: unbound identifier ... |
This looks like bad news. But it tells us something useful. Our code is trying to call the handle function (which is what we wanted). But it’s not succeeding, because handle doesn’t exist yet. We’ll fix that shortly.
In the meantime, how can we check the output of read-syntax? For debugging purposes, we can put in a temporary shim. We add a second ' prefix to our format string in format-datums, like so:
1 2 3 4 5 6 7 8 9 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums ''(handle ~a) src-lines)) (define module-datum `(module stacker-mod br ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) |
What will happen now? Before we run "stacker-test.rkt" again, let’s think through this change. Recall that when our syntax object from read-syntax is moved into place, Racket unquotes it so that it can be evaluated as code. By adding a second level of quoting, however, we’re protecting our src-datums from turning into code. Once they’re unquoted, they’ll still have a layer of quoting. They’ll still behave as data, and they’ll be printed out as usual.
Let’s run "stacker-test.rkt" and see if this idea works:
1 2 3 4 | '(handle) '(handle 42) '(handle "Hello world") '(handle #t) |
This looks mostly right: each argument in "stacker-test.rkt"—42, "Hello world", and #t—has been wrapped as a parenthesized handle form. Because of the extra level of quoting, each of these forms is printed as a datum.
But that '(handle) at the top—where did that come from? Remember that everything in the source file gets passed to read-syntax. That includes the newline between the end of the #lang line and the first line with 42. For instance, update "stacker-test.rkt" like this and run it again:
1 2 3 4 5 6 | #lang reader "stacker.rkt" 42 "Hello world" #t |
Every line, including the blank lines, will be converted to handle form:
1 2 3 4 5 6 | '(handle) '(handle 42) '(handle) '(handle "Hello world") '(handle) '(handle #t) |
It’s possible to filter out meaningless source code before we make the module expression—but we’ll save that technique for later. Here in stacker, we’ll just notice that blank lines will still generate calls to handle. So when we write handle, we should deal with them properly.
To finish our test, let’s change "stacker-test.rkt" to use the input from our original sample program:
When we run this, we get:
1 2 3 4 5 6 | '(handle) '(handle 4) '(handle 8) '(handle +) '(handle 3) '(handle *) |
Except for the quotes—which are there temporarily for debugging, and won’t exist on the real code—that’s exactly right.
We’re almost ready to move on. Let’s finish our reader by making two changes to read-syntax. First, since we’re done debugging, let’s remove the extra ' prefix from the format string in format-datums. Second, let’s change our module datum so that instead of invoking br as the expander, it will use "stacker.rkt". So the finished function looks like this:
1 2 3 4 5 6 7 8 9 | #lang br/quicklang (define (read-syntax path port) (define src-lines (port->lines port)) (define src-datums (format-datums '(handle ~a) src-lines)) (define module-datum `(module stacker-mod "stacker.rkt" ,@src-datums)) (datum->syntax #f module-datum)) (provide read-syntax) |
Of course, "stacker.rkt" doesn’t have an expander yet. We’ll add that next.