Take your taco-compiler result and run it under #lang tacopocalypse-prep:
#lang tacopocalypse-prep
(() taco () taco () () ())
(() taco () taco () () ())
(() taco () () () taco ())
(() () () taco () taco taco)
(taco () taco () () taco taco)
(() () taco taco () taco taco)
(() () taco taco () taco taco)
(taco taco taco taco () taco taco)
(() () () () () taco ())
(taco taco taco () taco taco taco)
(taco taco taco taco () taco taco)
(() taco () () taco taco taco)
(() () taco taco () taco taco)
(() () taco () () taco taco)
(() taco () () () taco ())
(() taco () taco () () ())
(() () () taco () taco ())
(taco taco () taco () taco ())
(() () () () () taco ())
(taco () () () taco taco ())
(() () () () () taco ())
(() () () taco () taco ())
(() taco () taco () taco ())
(() () () () () taco ())
(() taco () () taco taco ())
(() () () () () taco ())
(() () () taco () taco ())
(taco () taco taco () taco ())
(() () () () () taco ())
(() () () taco taco taco taco)
(taco () () taco () taco ())
(taco () () taco () taco ())
(taco () () taco () taco ())
You’ll get something like:
##$%#$%#$#$#$$##$%#$%#$#$#$$##$%#$#$#$%#$$##$#$#$%#$%%$#%#$%#$#$%%$##$#$%%#$%%$##$#$%%#$%%$#%%%%#$%%$##$#$#$#$#$%#$$#%%%#$%%%$#%%%%#$%%$##$%#$#$%%%$##$#$%%#$%%$##$#$%#$#$%%$##$%#$#$#$%#$$##$%#$%#$#$#$$##$#$#$%#$%#$$#%%#$%#$%#$$##$#$#$#$#$%#$$#%#$#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$##$%#$%#$%#$$##$#$#$#$#$%#$$##$%#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$#%#$%%#$%#$$##$#$#$#$#$%#$$##$#$#$%%%%$#%#$#$%#$%#$$#%#$#$%#$%#$$#%#$#$%#$%#$$
Hooray!
A lexer is the power tool for tokenizing. It contains a series of clauses with a regex-like pattern on the left. It matches the longest pattern (not the first pattern). Then it returns the value on the right.
Special identifiers available within lexer rules: lexeme (matched item), input-port (current port being consumed).
Question: without running the code, what will be the result of each tokenization?
(require brag/support) ; imports `lexer`
(define (apply-port-proc proc str)
(define ip (open-input-string str))
(for/list ([tok (in-port proc ip)])
tok))
(define lex (lexer
["fo" lexeme]
[(:: "f" (:+ "o")) 42]
[any-char (lex input-port)]))
(apply-port-proc read "foobar")
(apply-port-proc read-char "foobar")
(apply-port-proc lex "foobar")
(apply-port-proc lex "fobar")
Question (hard): why not just tokenize using cond and regular expressions?
Lexers are also used for source-code coloring in DrRacket.
Make a language called tacopocalypse that takes the output you generated above and converts it back into the original Racket source code.
Example:
#lang tacopocalypse
##$%#$%#$#$#$$##$%#$%#$#$#$$##$%#$#$#$%#$$##$#$#$%#$%%$#%#$%#$#$%%$##$#$%%#$%%$##$#$%%#$%%$#%%%%#$%%$##$#$#$#$#$%#$$#%%%#$%%%$#%%%%#$%%$##$%#$#$%%%$##$#$%%#$%%$##$#$%#$#$%%$##$%#$#$#$%#$$##$%#$%#$#$#$$##$#$#$%#$%#$$#%%#$%#$%#$$##$#$#$#$#$%#$$#%#$#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$##$%#$%#$%#$$##$#$#$#$#$%#$$##$%#$#$%%#$$##$#$#$#$#$%#$$##$#$#$%#$%#$$#%#$%%#$%#$$##$#$#$#$#$%#$$##$#$#$%%%%$#%#$#$%#$%#$$#%#$#$%#$%#$$#%#$#$%#$%#$$
Result (as before, blank newlines are deliberate):
"hello world"
(+ 1 (* 2 (- x)))
Hint: Here’s some starter code:
#lang br/quicklang
(require brag/support) ; imports `lexer`
(module+ reader
(provide read-syntax))
(define lex
(lexer
;; ···
))
(define (tokenize ip)
(for/list ([tok (in-port lex ip)])
tok))
(define (parse src toks)
;; ···
)
(define (read-syntax src ip)
(define toks (tokenize ip))
(define parse-tree (parse src toks))
(strip-bindings
(with-syntax ([PT parse-tree])
#'(module taco-mod tacopocalypse
PT))))
(define-macro (my-module-begin PT)
#'(#%module-begin
(display (list->string 'PT))))
(provide (rename-out [my-module-begin #%module-begin]))
Hint: you need at least three lexing rules. You don’t necessarily have to lex every character. Perhaps some are redundant?
Hint: in parse, you might find it useful to deal with sublists of tokens. The functions take and drop could be helpful.
Hint: no, I didn’t tell you how the notation works. There’s already a DSL mentioned on this page that you can use to figure it out.
Don’t panic.