Beautiful Racket: Into the rapids: more basic

Beautiful Racket / tutorials


All variables are global. There’s no way to create a “locally scoped” variable (e.g., one that is only valid inside a certain line or loop). So when any line in the program refers to variable x, it always has to be the same x.

A variable can be declared explicitly with let, or by simply assigning it a value with =. This program (using basic-demo-2 for a moment, because our own implementation isn’t working yet) prints 66:

#lang basic-demo-2
10 let x = 42
20 y = 24
30 print x + y
1 2 3 4
#lang basic-demo-2 10 let x = 42 20 y = 24 30 print x + y

Variables don’t have to be declared in advance. If a variable is used before its value has been set, it defaults to a value of 0. So this program is valid, and prints 000:

#lang basic-demo-2
10 print x ; y ; z
1 2
#lang basic-demo-2 10 print x ; y ; z


We could think of BASIC variables as a run-time concept. Meaning, we could start with an empty data structure—say, a hash table—and treat variable references as keys into the hash table, reading or writing as appropriate.

The good news is that this approach would be easier to implement, and would do most of what we need.

The bad news is that this approach is an evolutionary dead end. Once we make variables into hash keys, how will we use them on the REPL? (Insert annoying housekeeping here.) What if we want to provide these values outside the file? (Insert more housekeeping.)

Our shortcut starts to look like a false economy. Modeling our BASIC variables as hash keys might save us some setup effort. But before long, it’ll cost us.

Alternatively, we could think of our BASIC variables as a compile-time concept. Meaning, we could model them as actual Racket variables.

The good news is that this approach would make a lot of other things in our implementation easier. For instance, when we write macros, we could use BASIC variables in the same locations as regular Racket variables.

Well, almost. The first time we use a Racket variable, we introduce it into our program with define. In BASIC, because the program lines can be executed in any order, we wouldn’t be able to say at compile time which reference to a certain variable came “first”. No problem—we could just implement every assignment to a BASIC variable as a mutation (using set!).

Which brings us to the bad news: to make this work, we’d need to somehow define all the variables the program uses in advance. How?

basic/expander.rkt

#lang br/quicklang
(require "struct.rkt" "run.rkt" "elements.rkt")
(provide (rename-out [b-module-begin #%module-begin])
         (all-from-out "elements.rkt"))

(define-macro (b-module-begin (b-program LINE ...))
  (with-pattern
      ([((b-line NUM STMT ...) ...) #'(LINE ...)]
       [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))])
    #'(#%module-begin
       LINE ...
       (define line-table
         (apply hasheqv (append (list NUM LINE-FUNC) ...)))
       (void (run line-table)))))


#lang br/quicklang
(require "struct.rkt" "run.rkt" "elements.rkt")
(provide (rename-out [b-module-begin #%module-begin])
         (all-from-out "elements.rkt"))

(define-macro (b-module-begin (b-program LINE ...))
  (with-pattern
      ([((b-line NUM STMT ...) ...) #'(LINE ...)]
       [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))])
    #'(#%module-begin
       LINE ...
       (define line-table
         (apply hasheqv (append (list NUM LINE-FUNC) ...)))
       (void (run line-table)))))

A syntax property is a key–value field attached to a syntax object. The key has to be a symbol; the value can be anything. Syntax properties are read and written using syntax-property. When writing a property, syntax-property takes three arguments—syntax object, key, and value—and returns a syntax object with the updated field. When reading, it takes a syntax object and key and returns the matching value. If the syntax object doesn’t have a syntax property with that key, the result is #f:

This updates the right side of the window with information specific to the nested syntax object 10. As we can see, under Known properties—a list of the syntax properties attached to the object—are two properties, 'rule and 'b-line-num. We can use the 'rule property to discover the name of the original production rule that enclosed our 10, namely 'b-line-num. In turn, the 'b-line-num property holds a syntax object representing the original 'b-line-num rule name omitted from the parse tree.


Update our lexer to recognize identifiers, as well as let, =, and input.

Update our parser with rules that recognize let and input statements, and our identifiers. We’ll use splices to tag our identifiers with a special syntax property so we can find them later.

Update our #%module-begin macro to retrieve the identifiers and define them before the program runs.

Implement the let and input statements with macros in the expander, and patch our sum statement to handle a special case.

basic/lexer.rkt

#lang br
(require brag/support)

(define-lex-abbrev digits (:+ (char-set "0123456789")))

(define-lex-abbrev reserved-terms (:or "print" "goto" "end" "+"
":" ";" "let" "=" "input"))

(define basic-lexer
  (lexer-srcloc
   ["\n" (token 'NEWLINE lexeme)]
   [whitespace (token lexeme #:skip? #t)]
   [(from/stop-before "rem" "\n") (token 'REM lexeme)]
   [reserved-terms (token lexeme lexeme)]
   [(:seq alphabetic (:* (:or alphabetic numeric "$")))
    (token 'ID (string->symbol lexeme))]
   [digits (token 'INTEGER (string->number lexeme))]
   [(:or (:seq (:? digits) "." digits)
         (:seq digits "."))
    (token 'DECIMAL (string->number lexeme))]
   [(:or (from/to "\"" "\"") (from/to "'" "'"))
    (token 'STRING
           (substring lexeme
                      1 (sub1 (string-length lexeme))))]))

(provide basic-lexer)
#lang br
(require brag/support)

(define-lex-abbrev digits (:+ (char-set "0123456789")))

(define-lex-abbrev reserved-terms (:or "print" "goto" "end" "+"
":" ";" "let" "=" "input"))

(define basic-lexer
  (lexer-srcloc
   ["\n" (token 'NEWLINE lexeme)]
   [whitespace (token lexeme #:skip? #t)]
   [(from/stop-before "rem" "\n") (token 'REM lexeme)]
   [reserved-terms (token lexeme lexeme)]
   [(:seq alphabetic (:* (:or alphabetic numeric "$")))
    (token 'ID (string->symbol lexeme))]
   [digits (token 'INTEGER (string->number lexeme))]
   [(:or (:seq (:? digits) "." digits)
         (:seq digits "."))
    (token 'DECIMAL (string->number lexeme))]
   [(:or (from/to "\"" "\"") (from/to "'" "'"))
    (token 'STRING
           (substring lexeme
                      1 (sub1 (string-length lexeme))))]))

(provide basic-lexer)


We introduce a new lexer abbreviation called reserved-terms, because we’re going to be introducing a lot more of them. We move our existing terms there, plus let and = and input. We add a corresponding reserved-terms rule below.

We also introduce a rule to match identifiers, and put them into a token of type 'ID as a symbol.

basic/parser.rkt

#lang brag
b-program : [b-line] (/NEWLINE [b-line])*
b-line : b-line-num [b-statement] (/":" [b-statement])* [b-rem]
@b-line-num : INTEGER
@b-statement : b-end | b-print | b-goto | b-let | b-input
b-rem : REM
b-end : /"end"
b-print : /"print" [b-printable] (/";" [b-printable])*
@b-printable : STRING | b-expr
b-goto : /"goto" b-expr
b-let : [/"let"] b-id /"=" (b-expr | STRING)
b-input : /"input" b-id
@b-id : ID
b-expr : b-sum
b-sum : b-number (/"+" b-number)*
@b-number : INTEGER | DECIMAL | b-id
#lang brag
b-program : [b-line] (/NEWLINE [b-line])*
b-line : b-line-num [b-statement] (/":" [b-statement])* [b-rem]
@b-line-num : INTEGER
@b-statement : b-end | b-print | b-goto | b-let | b-input
b-rem : REM
b-end : /"end"
b-print : /"print" [b-printable] (/";" [b-printable])*
@b-printable : STRING | b-expr
b-goto : /"goto" b-expr
b-let : [/"let"] b-id /"=" (b-expr | STRING)
b-input : /"input" b-id
@b-id : ID
b-expr : b-sum
b-sum : b-number (/"+" b-number)*
@b-number : INTEGER | DECIMAL | b-id

basic/expander.rkt

#lang br/quicklang
(require "struct.rkt" "run.rkt" "elements.rkt")
(provide (rename-out [b-module-begin #%module-begin])
         (all-from-out "elements.rkt"))

(define-macro (b-module-begin (b-program LINE ...))
  (with-pattern
      ([((b-line NUM STMT ...) ...) #'(LINE ...)]
       [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))]
       [(VAR-ID ...) (find-unique-var-ids #'(LINE ...))])
    #'(#%module-begin
       (define VAR-ID 0) ...
       LINE ...
       (define line-table
         (apply hasheqv (append (list NUM LINE-FUNC) ...)))
       (void (run line-table)))))

(begin-for-syntax
  (require racket/list)
  (define (find-unique-var-ids line-stxs)
    (remove-duplicates
     (for/list ([stx (in-list (stx-flatten line-stxs))]
                #:when (syntax-property stx 'b-id))
       stx)
     #:key syntax->datum)))

#lang br/quicklang
(require "struct.rkt" "run.rkt" "elements.rkt")
(provide (rename-out [b-module-begin #%module-begin])
         (all-from-out "elements.rkt"))

(define-macro (b-module-begin (b-program LINE ...))
  (with-pattern
      ([((b-line NUM STMT ...) ...) #'(LINE ...)]
       [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))]
       [(VAR-ID ...) (find-unique-var-ids #'(LINE ...))])
    #'(#%module-begin
       (define VAR-ID 0) ...
       LINE ...
       (define line-table
         (apply hasheqv (append (list NUM LINE-FUNC) ...)))
       (void (run line-table)))))

(begin-for-syntax
  (require racket/list)
  (define (find-unique-var-ids line-stxs)
    (remove-duplicates
     (for/list ([stx (in-list (stx-flatten line-stxs))]
                #:when (syntax-property stx 'b-id))
       stx)
     #:key syntax->datum)))

The find-unique-var-ids function itself works as follows. We pass the argument #'(LINE ...), which is a syntax object containing all the b-line nodes from the parse tree. We use stx-flatten to turn it into a flat list of syntax objects. (In so doing, we demolish the structure of the parse tree, but that doesn’t affect our search—it just makes the data a little easier to handle.) We then iterate over these small syntax objects to find the ones with a syntax-property of 'b-id. These are the droids we’re looking for. We collect them into a list.

Because an identifier can show up any number of times in a program, we use remove-duplicates to winnow our list to distinct identifiers. Strictly speaking, no syntax object containing an identifier will be a “duplicate” of another, because, among other things, they’ll all have unique source locations. So we’d rather disregard these differences. To do this, we use the #:key argument, which is a function that’s applied to the elements before comparing them. We use syntax->datum for this argument so we can compare our syntax objects just on the basis of their underlying datums.

basic/misc.rkt

#lang br
(require "struct.rkt")
(provide b-rem b-print b-let b-input)

(define (b-rem val) (void))

(define (b-print . vals)
  (displayln (string-append* (map ~a vals))))

(define-macro (b-let ID VAL) #'(set! ID VAL))

(define-macro (b-input ID)
  #'(b-let ID (let* ([str (read-line)]
                     [num (string->number (string-trim str))])
                (or num str))))
#lang br
(require "struct.rkt")
(provide b-rem b-print b-let b-input)

(define (b-rem val) (void))

(define (b-print . vals)
  (displayln (string-append* (map ~a vals))))

(define-macro (b-let ID VAL) #'(set! ID VAL))

(define-macro (b-input ID)
  #'(b-let ID (let* ([str (read-line)]
                     [num (string->number (string-trim str))])
                (or num str))))

basic/expr.rkt

Beautiful Racket / tutorials

Into the rapids: more basic

Variables in BASIC

Modeling BASIC variables

Search and destroy

Gone but not forgotten

Introducing syntax properties

What about input?

The game plan

Lexer updates

Parser updates

Updating the expander

Testing our variables

Beau­tiful Racket / tuto­rials

Into the rapids: more basic

Beautiful Racket / tutorials