Beautiful Racket: Day 2

Beautiful Racket / racket school 2019


A grammar describes the structure of every syntactically correct program in a language. It does not describe every semantically correct program.

Question (hard): is it possible to do that?

A grammar is a sequence of production rules consisting of names (on the left) and patterns (on the right). By default, the parse starts with the first rule.

The patterns are made of tokens (aka terminals) or names of production rules (aka nonterminals). These pattern elements can be further modified with quantifiers, groups, or choice patterns.

zip-code : digit{5} ["-" digit{4}]
digit : "0" | "1" | "2" | "3" | "4"
      | "5" | "6" | "7" | "8" | "9"

zip-code : digit{5} ["-" digit{4}]
digit : "0" | "1" | "2" | "3" | "4"
      | "5" | "6" | "7" | "8" | "9"

zip-code : beginning [middle] end
beginning : digit digit
middle : end "-" digit
end : beginning digit
digit : number | zero
number : odd | even
odd : "1" | "3" | "5" | "7" | "9"
even : "2" | "4" | "6" | "8"
zero : "0"

zip-code : beginning [middle] end
beginning : digit digit
middle : end "-" digit
end : beginning digit
digit : number | zero
number : odd | even
odd : "1" | "3" | "5" | "7" | "9"
even : "2" | "4" | "6" | "8"
zero : "0"


A grammar can be used as source code for #lang brag to generate a parser.
#lang brag
zip-code : digit{5} ["-" digit{4}]
digit : "0" | "1" | "2" | "3" | "4"
| "5" | "6" | "7" | "8" | "9"
1 2 3 4
#lang brag zip-code : digit{5} ["-" digit{4}] digit : "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
(parse-to-datum "01234-5678")
1
(parse-to-datum "01234-5678")

The result of a parse is a parse tree. Each leaf corresponds to the result of a production rule: it contains the name of the rule followed by the terminal elements matched to its pattern.
'(zip-code (digit "0") (digit "1") (digit "2") (digit "3") (digit "4") "-" (digit "5") (digit "6") (digit "7") (digit "8"))
1
'(zip-code (digit "0") (digit "1") (digit "2") (digit "3") (digit "4") "-" (digit "5") (digit "6") (digit "7") (digit "8"))

Question: how do we know the parse tree only contains terminals?

Question: what makes this output format exceptionally convenient for makers of languages?

A grammar in brag can be further annotated with cuts and splices to remove or merge elements.
#lang brag
zip-code : @digit{5} [/"-" digit{4}] ; splice and cut
digit : "0" | "1" | "2" | "3" | "4"
| "5" | "6" | "7" | "8" | "9"
1 2 3 4
#lang brag zip-code : @digit{5} [/"-" digit{4}] ; splice and cut digit : "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
(parse-to-datum "01234-5678")
1
(parse-to-datum "01234-5678")
'(zip-code "0" "1" "2" "3" "4" (digit "5") (digit "6") (digit "7") (digit "8"))
1
'(zip-code "0" "1" "2" "3" "4" (digit "5") (digit "6") (digit "7") (digit "8"))

A brag module exports two functions. parse applies the grammar to a list of tokens (or a token-producing thunk) and produces a syntax object. parse-to-datum does the same, but returns the parse tree as a raw datum rather than syntax object.

Question: why is this grammar faulty?
#lang brag
foo : "x" bar
bar : foo
1 2 3
#lang brag foo : "x" bar bar : foo

Warning: brag won’t complain about ambiguous grammars.
#lang brag
top : foo | bar
foo : "x"
bar : "x"
1 2 3 4
#lang brag top : foo | bar foo : "x" bar : "x"

Question: is this a bug or a feature?

It’s possible to start a parse with something other than the first rule. This is a technique we can use to adjust how the REPL works for our language.


16-bit binary numbers, like 0100101101101101

US phone numbers, like (213) 555-6890

CSS hex strings representing grays, like "#ccc" and "#999999"

CSS hex strings representing any color, like "#f9c" and "#036a7e"

Binary numbers that are palindromes, like 11 and 10010100100101001

Binary numbers with an equal number of 0 and 1 digits, like 01101010000111

Math expressions involving addition and multiplication (don’t forget about operator precedence), like 1 + 2 * 3 + 4

tacogram/test.rkt

#lang tacogram
##$%#$%#$#$#$$##$%#$%#$#$#$$##$%#$#$#$%#$$##$#$#$%#$%%$#%#$%#$#$%%$##$#$%%#$%%$##$#$%%#$%%$#%%%%#$%%$##$#$#$#$#$%#$$#%%%#$%%%$#%%%%#$%%$##$%#$#$%%%$##$#$%%#$%%$##$#$%#$#$%%$##$%#$#$#$%#$$##$%#$%#$#$#$$##$#$#$%#$%#$$#%%#$%#$%#$$##$#$#$#$#$%#$$#%#$#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$##$%#$%#$%#$$##$#$#$#$#$%#$$##$%#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$#%#$%%#$%#$$##$#$#$#$#$%#$$##$#$#$%%%%$#%#$#$%#$%#$$#%#$#$%#$%#$$#%#$#$%#$%#$$
#lang tacogram
##$%#$%#$#$#$$##$%#$%#$#$#$$##$%#$#$#$%#$$##$#$#$%#$%%$#%#$%#$#$%%$##$#$%%#$%%$##$#$%%#$%%$#%%%%#$%%$##$#$#$#$#$%#$$#%%%#$%%%$#%%%%#$%%$##$%#$#$%%%$##$#$%%#$%%$##$#$%#$#$%%$##$%#$#$#$%#$$##$%#$%#$#$#$$##$#$#$%#$%#$$#%%#$%#$%#$$##$#$#$#$#$%#$$#%#$#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$##$%#$%#$%#$$##$#$#$#$#$%#$$##$%#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$#%#$%%#$%#$$##$#$#$#$#$%#$$##$#$#$%%%%$#%#$#$%#$%#$$#%#$#$%#$%#$$#%#$#$%#$%#$$

tacogram/main.rkt

#lang br/quicklang
(require "grammar.rkt")

(module+ reader
  (provide read-syntax))

(define (tokenize ip)
  (for/list ([tok (in-port read-char ip)])
    tok))

(define (parse src toks)
  (define parse-tree-datum (parse-to-datum toks))
  (for/list ([leaf (in-list (cdr parse-tree-datum))])
    (integer->char
     (for/sum ([val (in-list (cdr leaf))]
               [power (in-naturals)]
               #:when (equal? val '(taco)))
       (expt 2 power)))))

(define (read-syntax src ip)
  (define toks (tokenize ip))
  (define parse-tree (parse src toks))
  (strip-context
   (with-syntax ([PT parse-tree])
     #'(module tacogram-mod tacogram
         PT))))

(define-macro (mb PT)
  #'(#%module-begin
     (display (list->string 'PT))))
(provide (rename-out [mb #%module-begin]))
#lang br/quicklang
(require "grammar.rkt")

(module+ reader
  (provide read-syntax))

(define (tokenize ip)
  (for/list ([tok (in-port read-char ip)])
    tok))

(define (parse src toks)
  (define parse-tree-datum (parse-to-datum toks))
  (for/list ([leaf (in-list (cdr parse-tree-datum))])
    (integer->char
     (for/sum ([val (in-list (cdr leaf))]
               [power (in-naturals)]
               #:when (equal? val '(taco)))
       (expt 2 power)))))

(define (read-syntax src ip)
  (define toks (tokenize ip))
  (define parse-tree (parse src toks))
  (strip-context
   (with-syntax ([PT parse-tree])
     #'(module tacogram-mod tacogram
         PT))))

(define-macro (mb PT)
  #'(#%module-begin
     (display (list->string 'PT))))
(provide (rename-out [mb #%module-begin]))

tacogram/grammar.rkt

#lang brag

;; ···
#lang brag

;; ···

Beautiful Racket / racket school 2019

Day 2

Powerful points

Training mission

Mission

Summary

Beau­tiful Racket / racket school 2019

Day 2

Powerful points

Training mission

Mission

Summary

Beautiful Racket / racket school 2019