We saw how we were able to use cuts and splices to streamline our parse tree into a shape pretty close to that of our pseudocode in specification and setup.
This is our sample program:
1 2 3 4 5 6 7 8 9 | #lang basic 30 rem print 'ignored' 35 50 print "never gets here" 40 end 60 print 'three' : print 1.0 + 3 70 goto 11. + 18.5 + .5 rem ignored 10 print "o" ; "n" ; "e" 20 print : goto 60.0 : end |
This was the pseudocode:
1 2 3 4 5 6 7 8 | (define (30) (rem print "'ignored'")) (define (35) (void)) (define (50) (print "never gets here")) (define (40) (end)) (define (60) (print "three") (print (+ 1.0 3))) (define (70) (goto (+ 11 18.5 0.5)) (rem "ignored")) (define (10) (print "o" "n" "e")) (define (20) (print) (goto 60) (end)) |
And this is the parse tree that results from the parser:
1 2 3 4 5 6 7 8 9 | '(b-program (b-line 30 (b-rem "rem print 'ignored'")) (b-line 35) (b-line 50 (b-print "never gets here")) (b-line 40 (b-end)) (b-line 60 (b-print "three") (b-print (b-expr (b-sum 1.0 3)))) (b-line 70 (b-goto (b-expr (b-sum 11.0 18.5 0.5)))) (b-line 10 (b-print "o" "n" "e")) (b-line 20 (b-print) (b-goto (b-expr (b-sum 60.0))) (b-end))) |
Here in the expander, we need to take this code the rest of the way, so that it works as a BASIC program.
According to the specification and setup, these are the key missing pieces:
We need to convert each line of the source program—that is, each b-line element—into a function.
We need to make a hash table that maps line numbers to their associated functions, and a main program loop that looks up functions in this table and runs them.
We need to implement the behavior of our statements and expressions. In particular, we want to implement goto and end using exceptions.
Since we went through the trouble of collecting source locations, we should also try to apply them wherever sensible.
Let’s start by creating an "expander.rkt" module in our basic project:
1 | #lang br/quicklang |
Usually the first thing we write for our expander is the indispensable #%module-begin macro. This time, let’s instead figure out our line functions first.
In wires, we learned how we could use a macro to make a function corresponding to each wire definition. The idea here is similar: make a function that corresponds to each line definition. These line definitions are represented in our parse tree by b-line nodes. A simple version of our b-line macro would look like this:
1 2 3 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) #'(define (NUM) (void) STATEMENT ...)) |
We know from our parser grammar that a b-line contains a line number followed by zero or more statements. Thus, we can create a macro using a syntax pattern that matches these arguments. In the syntax template, we insert a (void) because the body of a function needs to have at least one expression, and STATEMENT ... may have zero elements. Otherwise, the (void) is benign.
We can quickly test our macro by grabbing the first line from our parse tree and entering it on the REPL:
1 | > (b-line 30 (b-rem "rem print 'ignored'")) |
But when we do that, we get an error:
1 | define: bad syntax (not an identifier for procedure name, and not a nested procedure form) in: 30 |
The problem is that Racket doesn’t accept numbers as identifiers. So if we want to make a function for line 30, we have to give it a different name. We can do this by prefixing our line number with line-. So line 30 will become a function called line-30.
We can derive a new identifier using prefix-id, which creates a new identifier by putting a prefix onto an existing one. Here, we attach line- to the front of our #'NUM identifier. + Reminder: though the pattern variable is called NUM (without the #' prefix) we can only use it inside a syntax template. So #'NUM is the world’s smallest syntax template that holds NUM. We can then use with-pattern to assign this new identifier to the pattern variable LINE-NUM, and substitute it into our syntax template:
1 2 3 4 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) (with-pattern ([LINE-NUM (prefix-id "line-" #'NUM)]) #'(define (LINE-NUM) (void) STATEMENT ...))) |
This time, when we enter the first line of our parse tree on the REPL, we don’t get an error:
1 | > (b-line 30 (b-rem "rem print 'ignored'")) |
We can verify that line-30 is a function:
1 | > line-30 |
1 | #<procedure:line-30> |
But we can’t run the function yet, because we haven’t defined b-rem inside:
1 | > (line-30) |
1 2 | b-rem: undefined; cannot reference an identifier before its definition |
That’s OK. What’s important is that the new identifier is working. In fact, if we were feeling lazy, we could stop here.
But no—we would feel terrible if we ignored source locations. There are two source locations we want to apply to our new code.
First, we want our new LINE-NUM identifier to have the same location as NUM. That way, if an error arises that concerns LINE-NUM, then NUM will be highlighted in the source.
By default, prefix-id doesn’t apply a source location. We can change this by using its optional #:source argument and passing #'NUM as the value. prefix-id will grab the source location of #'NUM and apply it to LINE-NUM:
1 2 3 4 5 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) (with-pattern ([LINE-NUM (prefix-id "line-" #'NUM #:source #'NUM)]) #'(define (LINE-NUM) (void) STATEMENT ...))) |
Second, we want our whole function definition to have the same location as the original b-line node. That way, if an error arises during the function, the original program line will be highlighted. By default, our syntax template does not transmit a source location to the new code.
To fix this, we need to roll out two new items: syntax/loc and caller-stx.
We may remember that the #' prefix we’ve been using with syntax objects and syntax templates is just shorthand for syntax. So these two expressions are the same:
If we want to apply a source location to our new syntax object, we can use syntax/loc instead of plain syntax. The first argument of syntax/loc is a syntax object whose source location we want to borrow for the new syntax object:
1 | (syntax/loc srcloc-stx (define (LINE-NUM) (void) STATEMENT ...)) |
We want to use the source location of the original b-line node, so we need a syntax object that represents the whole node. We can’t use #'NUM or #'(STATEMENT ...), because those arguments are inside the b-line node, and thus have different source locations.
When we use define-macro, the original input to the macro—before it’s matched to the defining syntax pattern—is made available in a special variable called caller-stx. This variable is only available in the body of the define-macro (or define-macro-cases). We can see how this works in this toy example, where mac prints caller-stx before listing the arguments passed to the macro:
1 2 | caller-stx is: #<syntax:5:0 (mac 42 "foo")> '(42 "foo") |
caller-stx contains the whole expression that invoked the macro, including mac itself.
Thus, we can finish our b-line macro by using syntax/loc for the syntax template, and passing caller-stx as the source-location argument:
1 2 3 4 5 6 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) (with-pattern ([LINE-NUM (prefix-id "line-" #'NUM #:source #'NUM)]) (syntax/loc caller-stx (define (LINE-NUM) (void) STATEMENT ...)))) |
Usually we’d also provide the b-line macro. This time, we’ll wait till the end of our expander, and export them as a set.
Now that we know how our line functions will work, we’re ready to start our #%module-begin. This will be the first macro executed when our program starts:
1 2 3 4 5 6 7 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) #'(#%module-begin LINE ...)) (provide (rename-out [b-module-begin #%module-begin])) |
As usual, we give our macro a temporary name and rename it in our provide expression. Rather than define the macro with the single argument PARSE-TREE, we use the pattern (b-program LINE ...). This just lets us unpack the b-program node into a series of LINE ... arguments right away, instead of defining a separate b-program macro.
We know that each of these LINE ... arguments will be converted into a function definition. So we just insert them directly into our syntax template.
Next, we need to set up a hash table of line numbers and line functions, and pass it to our main program loop:
1 2 3 4 5 6 7 8 9 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) #'(#%module-begin LINE ... (define line-table ···) (void (run line-table)))) (provide (rename-out [b-module-begin #%module-begin])) |
We call this new hash table line-table. We further suppose that our main function will be called run (we still need to write it, however). We pass any return value of run to void to discard it.
Now for the hash table itself. Racket has multiple kinds of hash tables. We always like to pick the fastest one for the job:
Our line-table is a mapping of numbers to functions that won’t change during the program run. Therefore, we’re looking for an immutable hash table that compares keys with eqv? (because eq? doesn’t work with numbers). + See equality for a full explanation of the differences between equality tests. So our fastest option is hasheqv.
The hasheqv constructor function takes an indefinite list of alternating keys and values. But because of the way the ... operator works in a macro, it’s hard to make a list of alternating arguments. What we can do instead is make a series of individual lists of key–value pairs—a NUM followed by a LINE-FUNC—and combine them with append and apply:
1 2 3 4 5 6 7 8 9 10 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table)))) (provide (rename-out [b-module-begin #%module-begin])) |
Our macro is almost done. We just need to manufacture the values for (NUM ...) and (LINE-FUNC ...). We can introduce these pattern variables into our syntax template using with-pattern:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([(NUM ...) ···] [(LINE-FUNC ...) ···]) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) (provide (rename-out [b-module-begin #%module-begin])) |
We need to pair (NUM ...) with a list of line numbers. Where do we get these? This requires a little bit of pattern-matching cleverness. We know that each raw LINE argument is shaped like so:
1 | (b-line NUM STATEMENT ...) |
It would be nice if we could just extract the NUM field from inside each LINE. We can—if we make a syntax pattern that matches each whole LINE, with NUM as a submatch. We can even use the same pattern we used to define the b-line macro:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STATEMENT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) ···]) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) (provide (rename-out [b-module-begin #%module-begin])) |
What happens in the highlighted code? The right-hand syntax object #'(LINE ...) is matched against the left-hand syntax pattern ((b-line NUM STATEMENT ...) ...). Because there’s an ellipsis at the top level of each, each individual LINE is matched against the subpattern (b-line NUM STATEMENT ...). This produces the list of NUM matches that we want. It also produces a list of STATEMENT matches, but we don’t need them, so we just ignore them. + It’s a common idiom in syntax patterns to notate a deliberately ignored element with an underscore, so this pattern could be written (b-line NUM _ ...).
To generate the LINE-FUNC ... names, we first put those NUM matches inside a new syntax template as #'(NUM ...). Then, just as we did in the b-line macro, use prefix-id to prefix each of them with line-. (This time, we don’t need to worry about source locations.) With this update, the finished macro looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STATEMENT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))]) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) (provide (rename-out [b-module-begin #%module-begin])) |
By the way, why can we subject our b-line elements to two rounds of pattern matching? Under Racket’s evaluation rules, macros are evaluated from the outside in. Therefore, the b-module-begin macro will go first, matching the b-line nodes, but also passing them through to its syntax template as LINE .... Later, these elements will be further expanded using the b-line macro, converting them into function definitions.
Our main program loop will be handled by a run function that takes a table of numbers and line functions, called line-table, as input:
1 2 3 4 5 6 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (define (run line-table) ···) |
Beyond that, according to our specification and setup, run needs to do the following:
Put the line functions into numerical order, and start the program at the lowest line number.
Execute the line function corresponding to the current line number.
Move to the next line function in numerical order, unless a signal is received to switch to another line.
End the program when there are no more lines to execute, or an end-program signal is received.
Let’s write the core loop for run, and then go back and add the change-line and end-program signals.
First we need to put our line numbers into numerical order:
1 2 3 4 5 6 7 8 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (define (run line-table) (define line-vec (list->vector (sort (hash-keys line-table) <))) ···) |
The keys of line-table are the line numbers. We use hash-keys to get them as a list, and then sort to put them into increasing order. We use list->vector to convert these numbers to a vector, which is a list-like data structure that’s faster than an ordinary list when we can foresee needing random access to elements. And we can—a BASIC program can jump arbitrarily from line to line with goto.
To run the program, we use a for/fold loop:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (define (run line-table) (define line-vec (list->vector (sort (hash-keys line-table) <))) (for/fold ([line-idx 0]) ([i (in-naturals)] #:break (>= line-idx (vector-length line-vec))) (define line-num (vector-ref line-vec line-idx)) (define line-func (hash-ref line-table line-num)) (line-func) (add1 line-idx))) |
This loop has one accumulator value, line-idx, that points at a location in line-vec. We start at 0. The loop iterates over i, an infinite stream of integers produced by in-naturals. (Recall that we used a similar technique to handle bf-loop elements in the bf expander.) We won’t do anything with i, but we need to have at least one iterator.
Using #:break, we stop our loop if our current line-idx is greater than or equal to the vector-length of line-vec (which would make it an invalid index value).
Otherwise, we look up the corresponding line-num in line-vec using vector-ref. Then we look up the corresponding line-func in line-table using line-num as the key. We execute line-func. Then we add1 to line-idx and keep going.
That’s all we’d need for a simple program without goto or end.
Now we add the change-line and end-program signals.
In specification and setup, we pledged to implement these signals using exceptions. An exception is a run-time value that pauses the current evaluation and travels back through the calling chain of functions. Exceptions are usually initiated with raise. The exception travels up through the calling chain until it reaches an exception handler, which is a function that processes the exception. Exception handlers are usually established using with-handlers. If no exception handler catches an exception, it stops the program.
Here’s a simple example (though there are more elaborate ones in errors and exceptions):
thrower turns every argument into an exception. catcher uses with-handlers to wrap thrower with an exception handler that only applies to numbers. with-handlers comprises a series of clauses with a predicate on the left, and a single-argument function on the right. If the exception matches the predicate, the function is called, with the exception as the input. So if we pass a number to catcher, the exception handler catches it and returns 'got-number. But if we pass an argument directly to thrower, or a non-number to catcher, then the exception handler isn’t invoked, and we get an uncaught-exception error.
In our run function, our goal will be to use exceptions not to trigger errors, but rather as a way of jumping out of our program loop. Specifically, our goto and end statements will each use raise to raise an exception. Back in run, we’ll use with-handlers to catch these exceptions and take action.
First we need to make our exceptions. In principle, though we can use any kind of value for an exception, it makes sense to create special predicates, so that the signal can be uniquely tested. A quick way to mint new predicates is to make a new structure type with struct:
1 2 3 4 5 6 7 8 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (struct end-program-signal ()) (struct change-line-signal (val)) (define (run line-table) ···) |
struct creates a new structure type with the name and optional fields in parentheses. It also manufactures supporting functions: a constructor, a structure predicate, and getters for each field (if any). + And setters, if the structure type is marked as #:mutable. So when we say:
1 | (struct change-line-signal (val)) |
We actually end up with three functions:
1 2 3 | change-line-signal ; constructor change-line-signal? ; predicate change-line-signal-val ; returns `val` field |
Invoked like so:
Now we’re ready to implement goto and end, which in our parse tree became b-goto and b-end:
1 2 3 4 5 6 7 8 9 10 11 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (struct end-program-signal ()) (struct change-line-signal (val)) (define (b-end) (raise (end-program-signal))) (define (b-goto expr) (raise (change-line-signal expr))) (define (run line-table) ···) |
b-goto uses raise to send up an exception using an instance of the change-line-signal structure type, holding a value of expr. Likewise, b-end raises an exception using end-program-signal.
Then we use with-handlers to catch these exceptions. The end-program-signal needs to get us out of the main program loop. So we locate the exception handler on the outside of our for/fold expression, using end-program-signal? as the predicate, and the exception handler returning (void):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (struct end-program-signal ()) (struct change-line-signal (val)) (define (b-end) (raise (end-program-signal))) (define (b-goto expr) (raise (change-line-signal expr))) (define (run line-table) (define line-vec (list->vector (sort (hash-keys line-table) <))) (with-handlers ([end-program-signal? (λ (exn-val) (void))]) (for/fold ([line-idx 0]) ([i (in-naturals)] #:break (>= line-idx (vector-length line-vec))) (define line-num (vector-ref line-vec line-idx)) (define line-func (hash-ref line-table line-num)) (line-func) (add1 line-idx)))) |
The second use of with-handlers that catches change-line-signal needs to be placed so it intercepts the return value of the loop, which is ordinarily (add1 line-idx)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (struct end-program-signal ()) (struct change-line-signal (val)) (define (b-end) (raise (end-program-signal))) (define (b-goto expr) (raise (change-line-signal expr))) (define (run line-table) (define line-vec (list->vector (sort (hash-keys line-table) <))) (with-handlers ([end-program-signal? (λ (exn-val) (void))]) (for/fold ([line-idx 0]) ([i (in-naturals)] #:break (>= line-idx (vector-length line-vec))) (define line-num (vector-ref line-vec line-idx)) (define line-func (hash-ref line-table line-num)) (with-handlers ([change-line-signal? (λ (cls) (define clsv (change-line-signal-val cls)) (or (and (exact-positive-integer? clsv) (vector-member clsv line-vec)) (error (format "error in line ~a: line ~a not found" line-num clsv))))]) (line-func) (add1 line-idx))))) |
Let’s step through what’s happening in this exception handler. An instance of the change-line-signal structure type is passed to the exception handler as input, and called cls. We extract the val field from the exception as clsv, which is supposed to be a line number. We check that clsv is an exact-positive-integer? and that it appears as a value in line-vec. For this, we use vector-member, which returns its index in the vector, or #f if it doesn’t exist. In that unfortunate circumstance, we raise an error saying the line couldn’t be found.
All that’s left is to implement the remaining pieces of our parse tree—b-rem, b-print, b-sum, and b-expr:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #lang br/quicklang (define-macro (b-line NUM STATEMENT ...) ···) (define-macro (b-module-begin (b-program LINE ...)) ···) (struct end-program-signal ()) (struct change-line-signal (val)) (define (b-end) (raise (end-program-signal))) (define (b-goto expr) (raise (change-line-signal expr))) (define (run line-table) ···) (define (b-rem val) (void)) (define (b-print . vals) (displayln (string-append* (map ~a vals)))) (define (b-sum . nums) (apply + nums)) (define (b-expr expr) (if (integer? expr) (inexact->exact expr) expr)) |
As usual, we just follow the grammar, which at this point should feel like vacation:
b-rem takes a single argument and returns nothing, aka (void).
b-print takes a series of printable values, concatenates them, and prints them on their own line using displayln. To make the output string, we convert our list of vals to strings with ~a and then string-append* them.
b-sum adds a list of numbers using good old +.
b-expr does a tiny bit of housekeeping. Unlike Racket, BASIC doesn’t make a distinction between exact and inexact numbers. (See numbers if this is news.) So 60.0 and 60 are the same number in BASIC. But they’re not in Racket—the first is an inexact number; the second is exact. But we want things like goto 4.5 + 5.5 to behave the same way as goto 10. So here in b-expr, we convert any integer back to an exact number.
Finally, we have to provide all our macros and functions that implement parts of the parse tree. Since we had the foresight to prefix them all with b-, we can use matching-identifiers-out to export them all at once:
1 | (provide (matching-identifiers-out #rx"^b-" (all-defined-out))) |
We start with all-defined-out, which included every function and macro defined in the module, and then winnow this down with the regular expression #rx"^b-", which selects every function and macro starting with b-.
Putting everything together, our expander should look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #lang br/quicklang (provide (matching-identifiers-out #rx"^b-" (all-defined-out))) (define-macro (b-line NUM STATEMENT ...) (with-pattern ([LINE-NUM (prefix-id "line-" #'NUM #:source #'NUM)]) (syntax/loc caller-stx (define (LINE-NUM) (void) STATEMENT ...)))) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STMT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))]) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) (provide (rename-out [b-module-begin #%module-begin])) (struct end-program-signal ()) (struct change-line-signal (val)) (define (b-end) (raise (end-program-signal))) (define (b-goto expr) (raise (change-line-signal expr))) (define (run line-table) (define line-vec (list->vector (sort (hash-keys line-table) <))) (with-handlers ([end-program-signal? (λ (exn-val) (void))]) (for/fold ([line-idx 0]) ([i (in-naturals)] #:break (>= line-idx (vector-length line-vec))) (define line-num (vector-ref line-vec line-idx)) (define line-func (hash-ref line-table line-num)) (with-handlers ([change-line-signal? (λ (cls) (define clsv (change-line-signal-val cls)) (or (and (exact-positive-integer? clsv) (vector-member clsv line-vec)) (error (format "error in line ~a: line ~a not found" line-num clsv))))]) (line-func) (add1 line-idx))))) (define (b-rem val) (void)) (define (b-print . vals) (displayln (string-append* (map ~a vals)))) (define (b-sum . vals) (apply + vals)) (define (b-expr expr) (if (integer? expr) (inexact->exact expr) expr)) |