Beautiful Racket: Follow the grammar: bf

Beautiful Racket / tutorials

A grammar consists of a series of production rules, written one per line. On the left of each rule is the name of a structural element of the language. The name is like a variable—it can be anything we want (though as usual, short & meaningful names are wise). A colon goes in the middle of the rule. + This notation style is known as Extended Backus–Naur form or EBNF, so this kind of grammar is also known as an EBNF grammar. Above, we have production rules for two elements: zip-code and digit.


The parser decomposes the source code into things that can’t be decomposed further, called terminals. In the zip-code grammar, the digit strings "0" through "9" would be terminals. The parser returns its result: a parse tree describing the structure of the program. + Corollary: the “leaves” of a parse tree are always terminals in the grammar.

The parser can’t find any way to decompose the source code into terminals according to the rules of the grammar. The parse fails.


Starting with the top rule, the parser would see that to make a zip-code, it would need to match five digits in a row.

It would match "1" "2" "3" as digits.

But when it reached "ABC", it would fail, because neither "A" nor "AB" nor "ABC" matches a possible pattern for digit.


This time, every character would be matched as a digit.

But when the parser went to match a fifth digit, it would get stuck again, because it would be out of characters.


This time, the parser would look for five digits, and would find them: each character of "01234" would be matched as a digit, with nothing left over.

'(zip-code
  (digit "0")
  (digit "1")
  (digit "2")
  (digit "3")
  (digit "4"))
'(zip-code
  (digit "0")
  (digit "1")
  (digit "2")
  (digit "3")
  (digit "4"))

stacker-program : "\n"* instruction ("\n"+ instruction)*
instruction     : integer | func
integer         : ["-"] digit+
digit           : "0" | "1" | "2" | "3" | "4"
                | "5" | "6" | "7" | "8" | "9"
func            : "+" | "*"

stacker-program : "\n"* instruction ("\n"+ instruction)*
instruction     : integer | func
integer         : ["-"] digit+
digit           : "0" | "1" | "2" | "3" | "4"
                | "5" | "6" | "7" | "8" | "9"
func            : "+" | "*"


As usual, the top-level element of our parse tree, stacker-program, is the name of the first production rule.

The pattern for this rule starts with "\n"*. The * quantifier should be familiar from regular expressions—it means “zero or more of the preceding item”. Taken together, "\n"* means “zero or more newlines”.

The next element in the pattern is instruction. There’s no quantifier on this element, which means every stacker-program needs to have at least one instruction.

Parentheses create subsequences. So the parenthesized expression ("\n"+ instruction) means “match the sequence "\n"+ followed by instruction”.

The + quantifier means “match one or more of the preceding item” (again, analogous to its meaning in regular expressions.) So "\n"+ means “one or more newlines”. This guarantees that multiple instructions are separated by one or more line breaks.

The * quantifier on the whole parenthesized subsequence once again means “zero or more of the preceding item”. So a stacker-program may or may not have multiple instruction elements separated by newlines.

Our instruction rule uses the | operator to indicate that an instruction can be either an integer or a func.

Square brackets mean “zero or one of the enclosed item”, aka an optional match. In our integer rule, the ["-"] in front of digit+ means that an integer may or may not have a "-" prefix. digit+ itself means “match one or more digits”.

As before, the digit and func rules use the | operator to list out the possibilities.


Each parenthesized node in the parse tree corresponds to a production rule, starting with the name of the rule, and followed by the elements that matched the pattern for that rule.

Rules that rely on other rules lead to deeper nesting. For instance, an integer node will always contain a digit node.

Every character that appeared in the original source string also appears in the parse tree.

'(m-expr
  (m-list "(" (func "+")
   " " (m-expr (integer (digit "1")))
   " " (m-expr
    (m-list "(" (func "*")
     " " (m-expr (integer (digit "2")))
     " " (m-expr
          (m-list "(" (func "+")
           " " (m-expr (integer (digit "3")))
           " " (m-expr (integer (digit "4"))) ")"))
     " " (m-expr (integer (digit "5")))
     ")"))
   " " (m-expr (integer (digit "6")))
   ")"))
'(m-expr
  (m-list "(" (func "+")
   " " (m-expr (integer (digit "1")))
   " " (m-expr
    (m-list "(" (func "*")
     " " (m-expr (integer (digit "2")))
     " " (m-expr
          (m-list "(" (func "+")
           " " (m-expr (integer (digit "3")))
           " " (m-expr (integer (digit "4"))) ")"))
     " " (m-expr (integer (digit "5")))
     ")"))
   " " (m-expr (integer (digit "6")))
   ")"))

Beautiful Racket / tutorials

Follow the grammar: bf

Writing a grammar

Applying a grammar

Alternative grammars

Ambiguous grammars

Groups and multiples in patterns

Recursive grammars

Beau­tiful Racket / tuto­rials

Follow the grammar: bf

Writing a grammar

Applying a grammar

Alter­na­tive gram­mars

Ambiguous gram­mars

Groups and multi­ples in patterns

Recur­sive gram­mars

Beautiful Racket / tutorials

Alternative grammars

Ambiguous grammars

Groups and multiples in patterns

Recursive grammars