It’ll be difficult to make any decent BASIC games without variables and user input. Input is easy once we have variables. Variables are a little harder. But we’ll use the principle that’s served us well so far: study the behavior of BASIC variables, and then figure out the most natural way to model this behavior in Racket.
Variables work a little differently in BASIC than in Racket:
All variables are global. There’s no way to create a “locally scoped” variable (e.g., one that is only valid inside a certain line or loop). So when any line in the program refers to variable x, it always has to be the same x.
A variable can be declared explicitly with let, or by simply assigning it a value with =. This program (using basic-demo-2 for a moment, because our own implementation isn’t working yet) prints 66:
1 2 3 4 | #lang basic-demo-2 10 let x = 42 20 y = 24 30 print x + y |
Variables don’t have to be declared in advance. If a variable is used before its value has been set, it defaults to a value of 0. So this program is valid, and prints 000:
1 2 | #lang basic-demo-2 10 print x ; y ; z |
Because BASIC programs rely almost entirely on global variables and mutation, they are in some sense the antithesis of the functional programming techniques that are idiomatic in Racket. Still, we will not be deterred.
Traditionally, BASIC variables were limited to one letter, and variables that held strings were one letter followed by $. We won’t preserve this tradition, because it’s pointless and awful. Instead we’ll allow our BASIC variables to be any sequence of alphabetic or numeric characters (or $) that starts with a letter. (Those who insist on one-letter names can still have them.)
We can foresee two possible approaches:
We could think of BASIC variables as a run-time concept. Meaning, we could start with an empty data structure—say, a hash table—and treat variable references as keys into the hash table, reading or writing as appropriate.
The good news is that this approach would be easier to implement, and would do most of what we need.
The bad news is that this approach is an evolutionary dead end. Once we make variables into hash keys, how will we use them on the REPL? (Insert annoying housekeeping here.) What if we want to provide these values outside the file? (Insert more housekeeping.)
Our shortcut starts to look like a false economy. Modeling our BASIC variables as hash keys might save us some setup effort. But before long, it’ll cost us.
Alternatively, we could think of our BASIC variables as a compile-time concept. Meaning, we could model them as actual Racket variables.
The good news is that this approach would make a lot of other things in our implementation easier. For instance, when we write macros, we could use BASIC variables in the same locations as regular Racket variables.
Well, almost. The first time we use a Racket variable, we introduce it into our program with define. In BASIC, because the program lines can be executed in any order, we wouldn’t be able to say at compile time which reference to a certain variable came “first”. No problem—we could just implement every assignment to a BASIC variable as a mutation (using set!).
Which brings us to the bad news: to make this work, we’d need to somehow define all the variables the program uses in advance. How?
Before we answer that question, let’s agree that the hash-table idea is a loser. In general, BASIC variables have a lot in common with Racket variables. We’ll get more benefit from keeping those concepts tied together, rather than choosing something with a less natural fit (= a hash table).
As always, we’ll have to update our lexer and our parser to handle variables and variable-assignment statements. We’ll also have to update the expander to implement the mutation underlying these statements. But for us, that part will be easy—we’ve done it before.
Let’s focus on the new problem. If we want to model BASIC variables as Racket variables, we need them all to be defined before we start running the individual lines of a program. So in a program like this:
1 2 3 4 5 6 7 | #lang basic-demo-2 10 x = 1 20 print x 30 y = x * 2 40 print y 50 z = x + y 60 print z |
We’d need x, y, and z to be defined (and initialized to 0) before the first line runs.
We know that any kind of setup work that needs to happen before the first line can live in the #%module-begin macro in our expander. For instance, we currently use this macro to set up a line-table before we call run to start the program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br/quicklang (require "struct.rkt" "run.rkt" "elements.rkt") (provide (rename-out [b-module-begin #%module-begin]) (all-from-out "elements.rkt")) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STMT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))]) #'(#%module-begin LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) |
Following that idea, if we had a list of the identifiers in the BASIC program, we could insert a define expression at the top of our #%module-begin for each, turning them into variables.
So where do we get this list of identifiers? In the last tutorial, we saw how we could use a syntax pattern to match b-line elements from our parse tree and extract a list of line numbers. That worked nicely because the lines were already accessible from the LINE ... pattern variable, and the line number was in the same location of every line.
This time, we can’t use that technique. Unlike the line numbers, our identifiers could be anywhere in the BASIC program. Somehow, we need to search the whole parse tree for uses of identifiers, and extract them.
We could do this by shaking the parse tree more vigorously—for instance, recursively taking it apart with match and looking for identifiers. This kind of frontal assault could work. But it also amounts to re-parsing the parse tree.
Therefore it would be nice to somehow consolidate this work inside our parser. It turns out that we can use cuts and splices in the parser—which we used before to simplify the parse tree—to also mark certain elements, like identifiers, so they’re easy to find later.
In #lang brag, when we use a cut on a rule name, or a splice on a pattern element, we end up removing a rule name from a node in the parse tree. Suppose our grammar looks like this:
1 2 3 | #lang brag program : list list : "1" "2" "3" |
The code 123 would be parsed like so:
1 | (program (list "1" "2" "3")) |
If we apply a cut to the list rule:
1 2 3 | #lang brag program : list /list : "1" "2" "3" |
The list name disappears from the parse tree, but the elements remain:
1 | (program ("1" "2" "3")) |
Or, if we apply a splice to either appearance of list in the grammar:
1 2 3 | #lang brag program : @list list : "1" "2" "3" |
Once again, the list name disappears from the parse tree, and the elements remain (though this time, those elements are spliced into the surrounding node):
1 | (program "1" "2" "3") |
As a convenience in these cases, #lang brag preserves the rule name by adding it as a syntax property to the residual elements.
A syntax property is a key–value field attached to a syntax object. The key has to be a symbol; the value can be anything. Syntax properties are read and written using syntax-property. When writing a property, syntax-property takes three arguments—syntax object, key, and value—and returns a syntax object with the updated field. When reading, it takes a syntax object and key and returns the matching value. If the syntax object doesn’t have a syntax property with that key, the result is #f:
1 2 3 4 5 | (define stx #'foobar) (define stx-with-prop (syntax-property stx 'my-key "my value")) (syntax-property stx 'my-key) ; #f (syntax-property stx-with-prop 'my-key) ; "my-value" (syntax-property stx-with-prop 'missing-key) ; #f |
Syntax properties offer us a way of annotating a syntax object with arbitrary metadata that would otherwise have to travel separately. They’re always optional. But what’s nice about syntax properties is that they behave like source locations: once a syntax property is set, it stays attached to a piece of syntax throughout its travels.
A limitation of syntax properties, however, is that they’re only available while we’re manipulating our code as a syntax object (say, within a macro at compile time). Once the code enters run time, the syntax properties are no longer available.
For now, let’s see how simple inspection of syntax properties works. First, we’ll look at the parse tree of a one-line program:
1 2 | #lang basic-demo/parse-only 10 print 2 + 2 |
1 | '(b-program (b-line 10 (b-print (b-num-expr (b-sum 2 2))))) |
No big surprises there. But let’s switch to #lang basic-demo/parse-stx, which will reveal the whole syntax object that results from the parse process (we didn’t make this dialect in the first tutorial, but it’s included with the beautiful-racket library):
1 2 | #lang basic-demo/parse-stx 10 print 2 + 2 |
This time, DrRacket prints a syntax object into the REPL:
We can click on the triangle at the left of the syntax object, and then the triangle next to Syntax Info, to reveal what’s inside:
On the left we see the datum inside the syntax object, which is the same as the simple parse tree we saw before. On the right we see the extra information that’s packaged into the syntax object.
On the datum side, let’s click on the line number 10, which started out inside a b-line node:
This updates the right side of the window with information specific to the nested syntax object 10. As we can see, under Known properties—a list of the syntax properties attached to the object—are two properties, 'rule and 'b-line-num. We can use the 'rule property to discover the name of the original production rule that enclosed our 10, namely 'b-line-num. In turn, the 'b-line-num property holds a syntax object representing the original 'b-line-num rule name omitted from the parse tree.
Next, let’s go back to the datum side and click on the left parenthesis next to b-print. This highlights the whole b-print expression. Recall that this started out wrapped inside a b-statement rule, which got spliced out:
As with 10, we see a syntax property called 'rule that tells us the name of the original enclosing rule, 'b-statement. In turn, there’s a 'b-statement property that holds the syntax object representing the original node in the parse tree. (We also see two other syntax properties: 'splice-rh-id and 'hide-or-splice. These are private values used by brag for other housekeeping.)
Finally, let’s click on either 2 inside b-sum. Again, we see a syntax property called 'rule, and another for the 'b-number node that was spliced out:
With that in mind, we can see the path toward implementing BASIC variables.
Oh, right. The input statement is just a let that takes its initial value from user input. When the program reaches the input statement, it will display a prompt and wait for the user to enter a value:
1 2 3 4 | #lang basic-demo-2 10 let x = 42 20 input y 30 print x ; y |
1 | > joyce |
And then it finishes:
1 | 42joyce |
We’ll add support for input as we work on let.
Now we’re ready. We have four tasks:
Update our lexer to recognize identifiers, as well as let, =, and input.
Update our parser with rules that recognize let and input statements, and our identifiers. We’ll use splices to tag our identifiers with a special syntax property so we can find them later.
Update our #%module-begin macro to retrieve the identifiers and define them before the program runs.
Implement the let and input statements with macros in the expander, and patch our sum statement to handle a special case.
Our goal will be to run this sample program:
1 2 3 4 5 6 | #lang basic-demo-2 10 let x = "foo" 20 y = 42 30 let z = x 40 input i 50 print z ; y + y + 10 ; x ; i |
And get this result:
1 | foo94foojoyce
|
Where joyce represents whatever value is entered at the user prompt.
In "lexer.rkt", we make two changes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #lang br (require brag/support) (define-lex-abbrev digits (:+ (char-set "0123456789"))) (define-lex-abbrev reserved-terms (:or "print" "goto" "end" "+" ":" ";" "let" "=" "input")) (define basic-lexer (lexer-srcloc ["\n" (token 'NEWLINE lexeme)] [whitespace (token lexeme #:skip? #t)] [(from/stop-before "rem" "\n") (token 'REM lexeme)] [reserved-terms (token lexeme lexeme)] [(:seq alphabetic (:* (:or alphabetic numeric "$"))) (token 'ID (string->symbol lexeme))] [digits (token 'INTEGER (string->number lexeme))] [(:or (:seq (:? digits) "." digits) (:seq digits ".")) (token 'DECIMAL (string->number lexeme))] [(:or (from/to "\"" "\"") (from/to "'" "'")) (token 'STRING (substring lexeme 1 (sub1 (string-length lexeme))))])) (provide basic-lexer) |
We introduce a new lexer abbreviation called reserved-terms, because we’re going to be introducing a lot more of them. We move our existing terms there, plus let and = and input. We add a corresponding reserved-terms rule below.
We also introduce a rule to match identifiers, and put them into a token of type 'ID as a symbol.
Our rule for identifiers follows our original specification: any sequence of alphabetic or numeric characters, or $, beginning with an alphabetic character, so the full rule is this:
1 | (:seq alphabetic (:* (:or alphabetic numeric "$"))) |
We convert our identifier lexeme to a symbol with string->symbol. Why? Things that exist as symbols in the parse tree (for instance, rule names) are treated as identifiers by the expander, and thus can have bindings attached to them. Whereas strings that reach the expander always remain strings. Obviously, we want our 'ID tokens to be treated as identifiers.
Moving on to "parser.rkt":
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #lang brag b-program : [b-line] (/NEWLINE [b-line])* b-line : b-line-num [b-statement] (/":" [b-statement])* [b-rem] @b-line-num : INTEGER @b-statement : b-end | b-print | b-goto | b-let | b-input b-rem : REM b-end : /"end" b-print : /"print" [b-printable] (/";" [b-printable])* @b-printable : STRING | b-expr b-goto : /"goto" b-expr b-let : [/"let"] b-id /"=" (b-expr | STRING) b-input : /"input" b-id @b-id : ID b-expr : b-sum b-sum : b-number (/"+" b-number)* @b-number : INTEGER | DECIMAL | b-id |
We add three rules: b-let, b-input, and b-id. We add b-let and b-input as possible matches on the right side of b-statement. We also add b-id to the b-number rule, so variables can be used in expressions (that’s a little casual, but we’ll do better when we get to expressions). The b-id rule itself matches tokens of type ID.
As we have before, we use cuts on the right side of the b-let and b-input rules to omit syntactic elements that we won’t need later.
We also apply a splice to the b-id rule. This means that everywhere b-id appears in the grammar, the identifier will be spliced into the node. But it also means that these identifiers will have their b-id syntax property set. Next, we use this syntax property to locate the identifiers in the parse tree.
We need to update our #%module-begin macro, and then implement the b-let and b-input statements.
In our "expander.rkt" module, we make two changes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #lang br/quicklang (require "struct.rkt" "run.rkt" "elements.rkt") (provide (rename-out [b-module-begin #%module-begin]) (all-from-out "elements.rkt")) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STMT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))] [(VAR-ID ...) (find-unique-var-ids #'(LINE ...))]) #'(#%module-begin (define VAR-ID 0) ... LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (void (run line-table))))) (begin-for-syntax (require racket/list) (define (find-unique-var-ids line-stxs) (remove-duplicates (for/list ([stx (in-list (stx-flatten line-stxs))] #:when (syntax-property stx 'b-id)) stx) #:key syntax->datum))) |
In b-module-begin, we add a (VAR-ID ...) pattern that matches the identifiers in our parse tree, which we extract with the helper function find-unique-var-ids (about which, more below). We also add (define VAR-ID 0) ... to the body of the macro, which will create the variables with their default value of 0.
Then the helper function. Though a macro generates code that is evaluated at run time (also known as phase 0), the code that implements the macro itself is evaluated at compile time (also known as phase 1). Racket is strict about keeping these phases separate. So when we want to make a helper function for the macro that we call from outside the syntax template, we have to introduce that function at phase 1.
Within the expander, an easy way to do this is wrap the function in begin-for-syntax, which is like begin, but shifts its contents into phase 1.
The find-unique-var-ids function itself works as follows. We pass the argument #'(LINE ...), which is a syntax object containing all the b-line nodes from the parse tree. We use stx-flatten to turn it into a flat list of syntax objects. (In so doing, we demolish the structure of the parse tree, but that doesn’t affect our search—it just makes the data a little easier to handle.) We then iterate over these small syntax objects to find the ones with a syntax-property of 'b-id. These are the droids we’re looking for. We collect them into a list.
Because an identifier can show up any number of times in a program, we use remove-duplicates to winnow our list to distinct identifiers. Strictly speaking, no syntax object containing an identifier will be a “duplicate” of another, because, among other things, they’ll all have unique source locations. So we’d rather disregard these differences. To do this, we use the #:key argument, which is a function that’s applied to the elements before comparing them. We use syntax->datum for this argument so we can compare our syntax objects just on the basis of their underlying datums.
Again, this isn’t the only way to find identifiers in a parse tree. In this case, because we know we can find the identifiers with the 'b-id syntax property, we can do a smash-and-grab.
Penultimately, we add two macros to "misc.rkt" to handle our b-let and b-input statements:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #lang br (require "struct.rkt") (provide b-rem b-print b-let b-input) (define (b-rem val) (void)) (define (b-print . vals) (displayln (string-append* (map ~a vals)))) (define-macro (b-let ID VAL) #'(set! ID VAL)) (define-macro (b-input ID) #'(b-let ID (let* ([str (read-line)] [num (string->number (string-trim str))]) (or num str)))) |
As promised, b-let simply uses set! to mutate the value of ID. b-input is phrased in terms of b-let. It uses the Racket library function read-line to get input from a user prompt. By default, the input from read-line is a string. We attempt to convert it to a number with string->number. But if that returns #f, we return the raw str value.
Finally, we adjust b-sum to handle an edge case. A consequence of our new grammar is that when an identifier appears on the right side of a let, it ends up wrapped in a b-expr. For instance, this program:
1 2 3 | #lang basic 10 let x = "foo" 30 let z = x |
Gets parsed like this:
1 2 3 | (b-program (b-line 10 (b-let x "foo")) (b-line 30 (b-let z (b-expr (b-sum x))))) |
The problem is that x holds the string "foo", which gets passed to b-sum. This will cause an error, because a string can’t be an argument to +.
To get around this, we’ll introduce a cute trick that we’ll reuse when we improve our expressions. We rewrite b-sum so that if we only get one argument, we pass it through untouched. Otherwise, we apply + as usual. This way, our string value can pass through without error. Moreover, we’ll still get the right answer if we have a single numerical argument (because (+ number) is just number):
Let’s try our sample program:
1 2 3 4 5 6 | #lang basic 10 let x = "foo" 20 y = 42 30 let z = x 40 input i 50 print z ; y + y + 10 ; x ; i |
When we run it, the program should stop at line 40 and put up a prompt at the REPL. We put in a value and type enter:
1 | > joyce |
And then the program should finish:
1 | foo94foojoyce
|
Our persistence has paid off. By choosing to model our BASIC variables as Racket variables—rather than a hash table—we took a slightly steeper route. But in so doing, we made things simpler for ourselves as we continue to extend the language.
We’ve shown we can assign, print, and add variables. But if we want more elaborate operations, we’ll have to upgrade our handling of expressions and values. That’s next.