Our project in this section will be to improve the REPL to be able to understand & evaluate expressions written in BASIC.
We haven’t said much about the REPL, though we’ve been using it throughout this book. The REPL—an acronym for read–evaluate–print loop, pronounced repple—is where Racket prints the results of programs:
1 | 42 |
On the command line, the REPL starts automatically when we run racket. In DrRacket, the lower panel of each window contains the REPL. The DrRacket REPL is more capable than the command-line version, however. For a surprise, try this:
The REPL is used for input as well as output. Expressions can be entered directly on the REPL, and evaluated immediately:
1 2 3 4 5 | > (define (g x) (* x x)) > g #<procedure:g> > (g 9) 81 |
The REPL is not an independent evaluation environment. Rather, expressions entered on the REPL are treated as if they lived in the top level of the currently running program. Therefore, REPL expressions can access all the bindings available in the program, even if they’re not exported with provide. For instance, this program:
Prints an initial result:
1 | 42 |
But we can move to the REPL and evaluate new expressions using f and x:
1 2 3 4 5 6 | > f #<procedure:f> > x 21 > (f (+ x x)) 84 |
In a way, we can think of the REPL as performing a smaller-scale version of the processing steps that happen in a full Racket-implemented language. When we make a language, the source code is sent to our reader, which turns it into a module expression, which is passed to our expander to be expanded and evaluated, and the results are printed. Likewise, as its name implies, the REPL reads an expression, then expands, evaluates, and prints it. + In Racket, perhaps “REEPL” would be the more accurate acronym, but REPL is the traditional name among Lisps.
The key difference between a language and the REPL is the behavior during the reader phase. In a language, the input (= the lines of source code) is read as a module, which then behaves as a self-contained program. Whereas on the REPL, the input (= a single line) is read as a single expression, which then behaves as a fragment of a larger program.
Those who have been fiddling with the REPL during our tutorials may have noticed something strange: no matter what our source language looks like, the REPL only accepts Racket-style S-expressions as input.
For instance, here’s a BASIC program similar to our earlier example that exports f and x:
1 2 3 4 | #lang basic 10 def f(x) = x + x 20 x = 21 30 export f : export x |
When we try to enter a BASIC-style expression on the REPL, we get an error: + The REPL reads f(x) as two expressions: f followed by (x), which becomes (21) and triggers an error, becase 21 is “not a procedure”.
1 2 3 4 5 6 | > f(x) #<procedure:f> application: not a procedure; expected a procedure that can be applied to arguments given: 21 arguments...: [none] |
But if we use Racket-style syntax, it works:
1 2 | > (f x) 42 |
“But we wrote a top-notch BASIC reader. Why doesn’t the REPL use that automatically?” Because it won’t work. As we learned above, a language reader produces a module that represents a self-contained program. Whereas a REPL reader needs to produce a single expression. Therefore, by default, the REPL doesn’t rely on the reader of the current language. Instead, it relies on the usual S-expression reader.
This isn’t completely useless. It just means that if we want to use the REPL, we need to manually convert BASIC expressions to what they’d look like in the parse tree produced by the main language reader.
We just saw one example: we passed x as an argument to the f function on the REPL by using the S-expression function notation (f x) rather than the BASIC function notation f(x).
Here’s another example. Even though we can’t evaluate x + x * x directly:
1 2 3 4 | > x + x * x 21 +: undefined; cannot reference an identifier before its definition |
We can evaluate the parse-tree version of this expression:
1 2 | > (b-sum x "+" (b-product x "*" x)) 462 |
Not completely useless. But still a chore and a bore. It would definitely be nicer to be able to enter BASIC expressions on the REPL with the usual BASIC notation.
The good news is that we can configure the REPL to work with BASIC expressions. To do this, we have two tasks:
We need to make a new “expression reader” for the REPL that can read single BASIC expressions.
When a BASIC module runs, we need to configure the REPL to use this new expression reader.
The second part is easy. Our BASIC-aware REPL will only be invoked when a BASIC module is run directly (vs. being imported into another module). Therefore, as we learned in exports, this is a job that can be handled by the configure-runtime submodule. We’ll update our do-setup! function to set the REPL to use the new expression reader.
So how about the first part—where do we get this expression reader? Clearly, we’d prefer to avoid starting from scratch. Ideally, we’d like to derive our expression reader from our current lexer & parser. In particular, we’ve already written a bunch of parser rules for handling BASIC expressions—it would be nice to be able to just reach in and reuse them.
As it turns out, we can. So far, we’ve used #lang brag to generate a parse function that starts from the first rule in the grammar. The starting rule ends up serving as the root node of the resulting parse tree. Since we use parse to handle whole programs, we’ve usually named this first rule langname-program.
But any module written in brag also provides a function called make-rule-parser that lets us generate a parser for any rule within the grammar. So rather than starting with the first rule, we can start anywhere.
Let’s see how this works. Suppose we want to parse the string we used above—x + x * x—as a BASIC expression. We convert the string into a port with open-input-string. Then, as we do in our reader, we pass the port to our tokenizer, and then a parser. But this time, rather than using parse, we generate a parser from the b-sum rule in our grammar:
1 2 3 4 5 | (require brag/support basic/parser basic/tokenizer) (define source (open-input-string "x + x * x")) (define sum-parser (make-rule-parser b-sum)) (define sum-parse-tree (sum-parser (make-tokenizer source))) (syntax->datum sum-parse-tree) |
The result, instead of a whole BASIC program, is a parse tree that starts at a b-sum node:
1 2 3 4 5 6 | '(b-sum (b-sum (b-product (b-neg (b-expt x)))) "+" (b-product (b-product (b-neg (b-expt x))) "*" (b-neg (b-expt x)))) |
If we remove the trivial one-argument expressions, we’re left with this:
1 | '(b-sum x "+" (b-product x "*" x)) |
This is the same as the parse-tree version of this expression we calculated by hand earlier. Best of all, we didn’t have to write anything new—we just reused our existing tokenizer and brag modules.
As a language reader, this parser wouldn’t be useful, because it can’t handle a whole program—just a sum expression. But as a REPL reader, it’s getting closer to what we need.
Now that we know how to generate a parser for any rule, we have a choice:
We can base our REPL reader on an existing rule—b-expr or b-statement would be good options.
Or, if we want to be more generous about what kind of expressions our new REPL will handle, we can add a special REPL rule to our parser grammar.
That’s the more flexible approach, so we’ll do it that way. At the bottom of "parser.rkt", let’s add a new rule called b-repl:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | #lang brag
b-program : [b-line] (/NEWLINE [b-line])*
b-line : b-line-num [b-statement] (/":" [b-statement])* [b-rem]
@b-line-num : INTEGER
b-rem : REM
@b-statement : b-end | b-print | b-goto
| b-let | b-input | b-if
| b-gosub | b-return | b-for | b-next
| b-def | b-import | b-export
b-end : /"end"
b-print : /"print" [b-printable] (/";" [b-printable])*
@b-printable : STRING | b-expr
b-goto : /"goto" b-expr
b-let : [/"let"] b-id /"=" (STRING | b-expr)
b-if : /"if" b-expr /"then" (b-statement | b-expr)
[/"else" (b-statement | b-expr)]
b-input : /"input" b-id
@b-id : ID
b-gosub : /"gosub" b-expr
b-return : /"return"
b-for : /"for" b-id /"=" b-expr /"to" b-expr [/"step" b-expr]
b-next : /"next" b-id
b-def : /"def" b-id /"(" b-id [/"," b-id]* /")" /"=" b-expr
b-import : /"import" b-import-name
@b-import-name : RACKET-ID | STRING
b-export : /"export" b-export-name
@b-export-name : ID
b-expr : b-or-expr
b-or-expr : [b-or-expr "or"] b-and-expr
b-and-expr : [b-and-expr "and"] b-not-expr
b-not-expr : ["not"] b-comp-expr
b-comp-expr : [b-comp-expr ("="|"<"|">"|"<>")] b-sum
b-sum : [b-sum ("+"|"-")] b-product
b-product : [b-product ("*"|"/"|"mod")] b-neg
b-neg : ["-"] b-expt
b-expt : [b-expt ("^")] b-value
@b-value : b-number | b-id | /"(" b-expr /")" | b-func
b-func : ID /"(" b-expr [/"," b-expr]* /")"
@b-number : INTEGER | DECIMAL
b-repl : (b-statement | b-expr) (/":" [@b-repl])*
|
This new REPL rule will accept BASIC statements (like print 24 * x) or raw BASIC expressions (like 24 * x). Following the convention in BASIC lines, we allow multiple inputs on the REPL to be chained together with : (which we express with a spliced, recursive reference to the b-repl rule).
Would it make sense to carve out certain statements from our REPL pattern? Our REPL expressions will be evaluated as if they were in the top level of our language module. They will not, however, be evaluated as if they were happening during the run loop. That means that certain statements on the REPL—like goto and gosub—will be invalid, because they don’t mean anything outside of the context of the run loop.
But if we carve them out here, then they’ll trigger a parsing error. That might be confusing to a user. Instead, we let them all through now, and handle the invalid ones in the expander, where we can raise a more informative error.
In "setup.rkt", we attach our new parser to the REPL. Just as we do in our main reader module, we start by importing "parser.rkt" and "tokenizer.rkt". We also define a repl-parse function based on our new b-repl parser rule:
1 2 3 4 5 6 7 8 9 10 11 | #lang br (require "parser.rkt" "tokenizer.rkt") (provide basic-output-port do-setup!) (define basic-output-port (make-parameter (open-output-nowhere))) (define repl-parse (make-rule-parser b-repl)) (define (do-setup!) (basic-output-port (current-output-port))) |
To do its job, the REPL relies on a special parameter called current-read-interaction. This parameter holds a read-interaction function that’s responsible for taking the input from the REPL and chunking it into syntax objects to evaluate. The REPL calls this read-interaction function repeatedly to get new syntax objects. When the read-interaction function returns eof, the REPL stops (meaning, control returns to the user at the REPL prompt).
The read-interaction function takes two arguments: a port, which holds the text input from the REPL, and an origin value (that ostensibly describes where the port came from, but we shouldn’t rely on it to contain useful information). Because each user command on the REPL ends with a newline, the contents of the port will be a newline-terminated string. So if the user enters this at the REPL prompt:
1 | > print 42 : x + x * x : 25 |
The port passed to the read-interaction function will contain this:
1 | "print 42 : x + x * x : 25\n" |
Whenever we want our REPL to behave differently, we need to set the current-read-interaction parameter to use a new read-interaction function. In this case, we call our new function read-one-line:
1 2 3 4 5 6 | (define (read-one-line origin port) (define one-line (read-line port)) (if (eof-object? one-line) eof (repl-parse (make-tokenizer (open-input-string one-line))))) |
When the REPL calls read-one-line the first time, it will read the first (and only) newline-terminated string in the port using read-line. We convert this string into a new port with open-input-string, pass this port to make-tokenizer, and this tokenizer to repl-parse, which will turn it into a parsed syntax object.
When the REPL calls read-one-line a second time, we get an eof-object? from read-line—because there’s only going to be one newline-terminated string in port—and we return eof, which will stop the REPL.
This little eof fandango may seem silly. But the REPL doesn’t make any assumptions about how input corresponds to syntax objects. That’s why we have to do this by hand. Actually, if we wanted, we could implement read-one-line using lexer:
1 2 3 4 5 6 7 8 | (define (read-one-line origin port) (define repl-lexer (lexer [(from/to any-char "\n") (repl-parse (make-tokenizer (open-input-string (string-trim lexeme))))])) (repl-lexer port)) |
Both versions will do the same thing. But we’ll stick with the first one—in this case, read-line takes care of the housekeeping more neatly than lexer.
By the way, unlike our main read-syntax function, we don’t need to pass around an extra path argument to the tokenizer and parser. We only needed that when we wanted to generate source locations. In the REPL, we can ignore them.
To complete the REPL setup, we go inside do-setup! and reset the current-read-interaction to use our new read-one-line function. We remember that do-setup! is called by our configure-runtime submodule. So when we run a BASIC program directly, the configure-runtime submodule will be triggered, and our REPL will be reprogrammed to use our read-one-line function.
The finished "setup.rkt" module looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #lang br (require "parser.rkt" "tokenizer.rkt") (provide basic-output-port do-setup!) (define basic-output-port (make-parameter (open-output-nowhere))) (define repl-parse (make-rule-parser b-repl)) (define (read-one-line origin port) (define one-line (read-line port)) (if (eof-object? one-line) eof (repl-parse (make-tokenizer (open-input-string one-line))))) (define (do-setup!) (basic-output-port (current-output-port)) (current-read-interaction read-one-line)) |
Because we chose to create a new b-repl rule in the parser, we also have to implement it in the expander. We add a b-repl macro to "misc.rkt":
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #lang br (require "struct.rkt") (provide b-rem b-print b-let b-input b-import b-export b-repl) (define (b-rem val) (void)) (define (b-print . vals) (displayln (string-append* (map ~a vals)))) (define-macro (b-let ID VAL) #'(set! ID VAL)) (define-macro (b-input ID) #'(b-let ID (let* ([str (read-line)] [num (string->number (string-trim str))]) (or num str)))) (define-macro (b-import NAME) #'(void)) (define-macro (b-export NAME) #'(void)) (define-macro (b-repl . ALL-INPUTS) (with-pattern ([INPUTS ···]) #'(begin . INPUTS))) |
In the b-repl macro, we’re introducing some syntax-pattern notation we haven’t used before. The dot . is like a rest argument for syntax patterns. It means “collect the rest of the syntax elements in this pattern variable”. It’s an alternative to an ellipsis ... for matching multiple items. So these two patterns are the same:
1 2 | (b-repl EACH-INPUT ...) (b-repl . ALL-INPUTS) |
One difference is that like a rest argument, the dot can only be used at the tail of a pattern. An ellipsis can be used anywhere.
Another difference comes when we use the resulting pattern variable in a syntax template. A pattern variable created with a dot, like ALL-INPUTS, represents a self-contained list of syntax objects. If we want to combine it with another element, it has to appear at the end of a parenthesized expression in the syntax template, and we use another dot . as a connector.
EACH-INPUT, by contrast, represents the first element of a list of syntax objects, and the accompanying ... represents the rest. So these syntax templates are equivalent:
A limitation of the dot is that it doesn’t offer the step-and-repeat behavior that the ellipsis does. For instance, this expression will multiply EACH-INPUT by 42:
But in macros where we’re not using this behavior, the dot can make our templates and patterns tidier. It’s an option, not a requirement. For further details, see syntax patterns.
We still need to fill in the interior of the b-repl macro. When we made the b-repl parser rule, we decided to permit every kind of statement and expression as input, even the ones that didn’t make sense in a REPL context, and sort through them later.
That time has come. The main work of the b-repl macro will be to take a list of ALL-INPUTS and winnow it to a list of valid INPUTS.
So which inputs make sense on the REPL? Probably just these:
print statements should do their usual thing.
BASIC expressions should be printed.
let statements should introduce new variables.
def statements should introduce new functions.
All the other statements—including goto, gosub, and for / next—only make sense in the context of a running program. So we just raise an error if we see any of these.
To process our list of ALL-INPUTS, we use a syntax helper function called pattern-case-filter. This is like a hybrid of cond and filter. It tries to match each element in a list of syntax objects to one or more pattern clauses. If a pattern matches, the syntax template on the right side is returned. If no pattern matches, or the right-side result is #f, the element is omitted. The result of pattern-case-filter is a new (and possibly shorter) list of syntax objects.
Let’s start our pattern-case-filter with a rule to match print statements. We want to just pass these through. So our syntax pattern will match input items that begin with b-print, and just emit a result that’s the same as the input:
1 2 3 4 | (pattern-case-filter #'ALL-INPUTS [(b-print . PRINT-ARGS) #'(b-print . PRINT-ARGS)] ···) |
Next are BASIC expressions. These will come through as b-expr items. We want to print these, so we wrap them in a b-print expression. In other words, every raw BASIC expression ex will behave as if we typed print ex.
1 2 3 4 5 6 | (pattern-case-filter #'ALL-INPUTS [(b-print . PRINT-ARGS) #'(b-print . PRINT-ARGS)] [(b-expr . EXPR-ARGS) #'(b-print (b-expr . EXPR-ARGS))] ···) |
But wait—there’s a trap for the unwary here. We know that an UPPERCASE pattern variable is a wildcard that matches any value; anything else in the syntax pattern is matched literally. When a pattern contains an identifier like b-expr, the notion of “literally” means not just the identifier’s name but also its binding. + Racket jocks might know this as equality in the sense of free-identifier=?.
Let’s see an example. Here, zeta-stx is a syntax object that refers to zeta, a binding from the math library. But when we try to match zeta-stx to the pattern zeta, we get an empty list:
The problem here is that the literal identifier zeta in the syntax pattern has no binding. So even though the identifier names are the same, their bindings are not. Therefore, the pattern doesn’t match, and the result is the empty list.
If we want this pattern to work, we have to import math so that the syntax pattern in pattern-case-filter will refer to the same zeta:
Coming back to b-expr—when we see a parsed element starting with b-expr, it’s not just any b-expr, but the b-expr introduced from our "expr.rkt" module. So if we want to match this particular b-expr, we need to import "expr.rkt" into this module: + We didn’t have the same possible problem with b-print because it’s defined in this module.
1 | (require "struct.rkt" "expr.rkt") |
Moving on to let and def statements. We want to support these because it’s useful to be able to create variables and functions on the REPL. But we recall that in our language implementation, our variables and functions are hoisted to the top of the module and initialized to 0. Then the corresponding b-let and b-def macros use set! to change their values. This won’t work on the REPL—we can’t set! a variable that hasn’t been created.
So on the REPL, when we come across a b-let or b-def expression, we rewrite them as ordinary define expressions:
1 2 3 4 5 6 7 8 9 10 | (pattern-case-filter #'ALL-INPUTS [(b-print . PRINT-ARGS) #'(b-print . PRINT-ARGS)] [(b-expr . EXPR-ARGS) #'(b-print (b-expr . EXPR-ARGS))] [(b-let ID VAL) #'(define ID VAL)] [(b-def FUNC-ID VAR-ID ... EXPR) #'(define (FUNC-ID VAR-ID ...) EXPR)] ···) |
Anything else should be an error. So we add one more case to our pattern-case-filter, pop it in the b-repl macro, and the finished "misc.rkt" looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #lang br (require "struct.rkt" "expr.rkt") (provide b-rem b-print b-let b-input b-import b-export b-repl) (define (b-rem val) (void)) (define (b-print . vals) (displayln (string-append* (map ~a vals)))) (define-macro (b-let ID VAL) #'(set! ID VAL)) (define-macro (b-input ID) #'(b-let ID (let* ([str (read-line)] [num (string->number (string-trim str))]) (or num str)))) (define-macro (b-import NAME) #'(void)) (define-macro (b-export NAME) #'(void)) (define-macro (b-repl . ALL-INPUTS) (with-pattern ([INPUTS (pattern-case-filter #'ALL-INPUTS [(b-print . PRINT-ARGS) #'(b-print . PRINT-ARGS)] [(b-expr . EXPR-ARGS) #'(b-print (b-expr . EXPR-ARGS))] [(b-let ID VAL) #'(define ID VAL)] [(b-def FUNC-ID VAR-ID ... EXPR) #'(define (FUNC-ID VAR-ID ...) EXPR)] [ANYTHING-ELSE #'(error 'invalid-repl-input)])]) #'(begin . INPUTS))) |
Now we’re ready to try out our new REPL. Let’s make a little program that defines a function and a variable:
1 2 3 | #lang basic 10 def f(x) = x * x 20 y = 10 |
We should be able to evaluate these directly on the REPL:
1 2 3 4 | > f #<procedure:f> > y 10 |
How about some BASIC-style expressions:
1 2 3 4 5 6 7 8 | > y + y * y 110 > f(y) 100 > f(y) * y 1000 > f(f(y)) 10000 |
Let’s also try using explicit print statements:
1 2 3 4 5 6 7 8 9 10 | > print y + y * y 110 > print f(y) 100 > print f(y) * y 1000 > print f(f(y)) 10000 > print y + y * y ; f(y) ; f(y) * y ; f(f(y)) 110100100010000 |
And colon-separated chains of statements and expressions:
1 2 3 4 5 | > print y + y * y : f(y) : print f(y) * y : f(f(y)) 110 100 1000 10000 |
Now a def and let:
1 2 3 4 5 6 7 8 9 10 | > def g(i) = 2 * i > g #<procedure:g> > let z = 20 > z 20 > z + y 30 > g(f(z)) 800 |
And finally some bad input:
1 2 | > goto 10 error: invalid-repl-input |
That took a while, but only because we have such high standards. As with syntax coloring and indenting, our language will work fine even if we never improve the REPL. But the REPL is an intrinsic part of Racket. A small amount of extra effort—adapting the tokenizer and parser we’ve already made—pays off in a better language interface.