Beautiful Racket: Follow the grammar: bf

Beautiful Racket / tutorials


Strings in the source code that are meaningless—e.g., those that represent comments—can be removed.

Strings that represent a type of value—e.g., those that represent numbers—can be labeled with a generic token type, like NUMBER. This simplifies the grammar, as we can just use NUMBER to mean any number string, rather than having to make grammar rules that cover every possible number pattern, e.g., 23.8, 3/4, 42+3i.

Strings that should be handled literally—e.g., a single character like < representing an operation—can just pass through.


Substrings from the source code that are removed (like comments) are then completely invisible to the parser.

Tokens are indivisible. Once we fuse a substring into a token, it can’t be decomposed further by the parser.

For instance, we could put a regular expression inside the tokenizer to match five sequential digits, and then package them into the token type FIVE-DIGIT-TOKEN. Then we could update our grammar to look like this:
zip-code : FIVE-DIGIT-TOKEN
1
zip-code : FIVE-DIGIT-TOKEN

This isn’t wrong. But our parse tree would now look like this, which might be less detail than we need:
'(zip-code "01234")
1
'(zip-code "01234")

Bigger tokens can be convenient, because they reduce the complexity of the grammar. But they also reduce its flexibility. Suppose we wanted to upgrade our zip-code grammar to cover nine-digit zip codes (= five digits, optionally followed by a hyphen and four more digits). We wouldn’t be able to do it with FIVE-DIGIT-TOKEN, because it’s not granular enough. But we could do it with DIGIT-TOKEN:
zip-code : five-digits [("-" four-digits)]
five-digits : digit digit digit digit digit
four-digits : digit digit digit digit
digit : DIGIT-TOKEN
1 2 3 4
zip-code : five-digits [("-" four-digits)] five-digits : digit digit digit digit digit four-digits : digit digit digit digit digit : DIGIT-TOKEN

parser.rkt

#lang brag
bf-program : (bf-op | bf-loop)*
bf-op      : ">" | "<" | "+" | "-" | "." | ","
bf-loop    : "[" (bf-op | bf-loop)* "]"

#lang brag
bf-program : (bf-op | bf-loop)*
bf-op      : ">" | "<" | "+" | "-" | "." | ","
bf-loop    : "[" (bf-op | bf-loop)* "]"

reader.rkt

#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)
#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)


Just like the read-syntax in stacker—and every read-syntax we’ll ever write—this read-syntax will take as input a source path and input port.

But this time, instead of manually reading strings of code from port, we pass the port to make-tokenizer, which returns a function that reads characters from the port and generates tokens.

In turn, we make these tokens available to parse, which uses our grammar to produce our parse-tree.

As we did in stacker, we create a module-datum representing the code for a module, and put our parse tree inside it.

Finally, we use datum->syntax to package this code as a syntax object.

reader.rkt

#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(define (make-tokenizer port)
  (define (next-token)
    ···)
  next-token)
#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(define (make-tokenizer port)
  (define (next-token)
    ···)
  next-token)

Finally, the tokenizing rules. This isn’t the moment for us to go deep on tokenizer-rule notation—we’ll save that for a later tutorial. In short, the tokenizer relies on a helper function called a lexer. Each branch of the lexer represents a rule. On the left side of the branch is a pattern that works like a regular expression. On the right side is a token-creating expression. Each time next-token is called, bf-lexer will read as many characters from the port as it can while still matching a rule pattern (aka “greedy” matching). The right side of the rule will convert the matched characters into a token, and this token will be returned as the result.

reader.rkt

#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(require brag/support)
(define (make-tokenizer port)
  (define (next-token)
    (define bf-lexer
      (lexer
       [(char-set "><-.,+[]") lexeme]
       [any-char (next-token)]))
    (bf-lexer port))  
  next-token)
#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(require brag/support)
(define (make-tokenizer port)
  (define (next-token)
    (define bf-lexer
      (lexer
       [(char-set "><-.,+[]") lexeme]
       [any-char (next-token)]))
    (bf-lexer port))  
  next-token)

atsign.rkt

#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.
#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.

expander.rkt

#lang br/quicklang

(define-macro (bf-module-begin PARSE-TREE)
  #'(#%module-begin
     'PARSE-TREE))
(provide (rename-out [bf-module-begin #%module-begin]))
#lang br/quicklang

(define-macro (bf-module-begin PARSE-TREE)
  #'(#%module-begin
     'PARSE-TREE))
(provide (rename-out [bf-module-begin #%module-begin]))

'(bf-program
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-loop
   "["
   (bf-op ">")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "<")
   (bf-op "-")
   "]")
  (bf-op ">")
  (bf-op "."))
'(bf-program
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-op "-")
  (bf-op "+")
  (bf-op "+")
  (bf-loop
   "["
   (bf-op ">")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "-")
   (bf-op "+")
   (bf-op "+")
   (bf-op "<")
   (bf-op "-")
   "]")
  (bf-op ">")
  (bf-op "."))


We’re seeing the effect of our tokenizer, which discarded the characters in the line Greatest language ever!. They never reached the parser, so they don’t appear in the parse tree.

We’re otherwise seeing the same parse tree that we got before—as we should. All the characters in our source code appear in a node of the parse tree, and these nodes correspond to the names and patterns of the production rules.

atsign.rkt

#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.[
#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.[

atsign.rkt

#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.
#lang reader "reader.rkt"
Greatest language ever!
++++-+++-++-++[>++++-+++-++-++<-]>.

expander.rkt

#lang br/quicklang

(define-macro (bf-module-begin PARSE-TREE)
  #'(#%module-begin
     PARSE-TREE))
(provide (rename-out [bf-module-begin #%module-begin]))
#lang br/quicklang

(define-macro (bf-module-begin PARSE-TREE)
  #'(#%module-begin
     PARSE-TREE))
(provide (rename-out [bf-module-begin #%module-begin]))

reader.rkt

#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(require brag/support)
(define (make-tokenizer port)
  (define (next-token)
    (define bf-lexer
      (lexer
       [(char-set "><-.,+[]") lexeme]
       [any-char (next-token)]))
    (bf-lexer port))  
  next-token)
#lang br/quicklang
(require "parser.rkt")

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port)))
  (define module-datum `(module bf-mod "expander.rkt"
                          ,parse-tree))
  (datum->syntax #f module-datum))
(provide read-syntax)

(require brag/support)
(define (make-tokenizer port)
  (define (next-token)
    (define bf-lexer
      (lexer
       [(char-set "><-.,+[]") lexeme]
       [any-char (next-token)]))
    (bf-lexer port))  
  next-token)

Beautiful Racket / tutorials

Follow the grammar: bf

Introducing the tokenizer

Designing the BF tokenizer

Writing a reader with a tokenizer

Testing the reader

Beau­tiful Racket / tuto­rials

Follow the grammar: bf

Intro­ducing the tokenizer

Designing the BF tokenizer

Writing a reader with a tokenizer

Testing the reader

Beautiful Racket / tutorials

Introducing the tokenizer