read-syntax must accept two input arguments (= source path & input port). It must return a module expression as a syntax object without bindings. This module expression must include a reference to the expander that will provide the initial set of bindings when the module code is evaluated.
Often, a reader uses a tokenizer to convert the source code into tokens. A token is a contiguous sequence of characters that affects the meaning of the program.
Question: how many tokens are in this trigram program?
12 & AB
How about this math expression?
12+97
How about this Racket program?
;; hello world!
(* 62 719)
How about this Python program? (Count carefully.)
if True:
print "hooray"
Often, a reader uses a parser to arrange these tokens into a hierarchical structure, called a parse tree.
Python if conditional:
if y > 0:
x / y
else:
print("nope")
As a chart:
As an S-expression:
'(if (> y 0)
:
(/ x y)
else:
(print "nope"))
Racket if conditional:
(if (> y 0)
(/ x y)
(print "nope"))
The parser consumes a sequence of tokens, and the resulting parse tree becomes the program embedded in the module expression. Pseudocodishly:
(define (read-syntax src ip)
(define toks (tokenize-all ip))
(define parse-tree (parse src toks))
(strip-bindings
(with-syntax ([PT parse-tree])
#'(module mod-name expander-mod
PT))))
If you use a parser generator like brag to make your parse function (and we will, later today) it can also accept a zero-arity function (aka thunk) that produces one token (which will be called repeatedly, until eof). Pseudocodishly:
(define (read-syntax src ip)
(define token-thunk (λ () (tokenize-one ip)))
(define parse-tree (parse src token-thunk))
(strip-bindings
(with-syntax ([PT parse-tree])
#'(module mod-name expander-mod
PT))))
Question: In what situations would the second approach potentially be wiser?
module+ is a variant of module that lets us create a submodule that automatically includes the surrounding bindings, and possibly adds new ones.
(module reader br
(provide read-syntax)
(define (read-syntax src ip)
···))
(module+ reader
(provide read-syntax))
(define (read-syntax src ip)
···)
Question: How might this be useful?
Make a language called taco-compiler that takes any #lang racket program made of ASCII characters as input and prints it—the program, not its result—in notation that only uses the identifier taco and parentheses. If you need math operations, look them up in the Racket docs.
Easier option: implement the sample notation shown below.
Harder option: devise your own.
Example:
#lang taco-compiler
"hello world"
(+ 1 (* 2 (- x)))
One possible result:
(() taco () taco () () ())
(() taco () taco () () ())
(() taco () () () taco ())
(() () () taco () taco taco)
(taco () taco () () taco taco)
(() () taco taco () taco taco)
(() () taco taco () taco taco)
(taco taco taco taco () taco taco)
(() () () () () taco ())
(taco taco taco () taco taco taco)
(taco taco taco taco () taco taco)
(() taco () () taco taco taco)
(() () taco taco () taco taco)
(() () taco () () taco taco)
(() taco () () () taco ())
(() taco () taco () () ())
(() () () taco () taco ())
(taco taco () taco () taco ())
(() () () () () taco ())
(taco () () () taco taco ())
(() () () () () taco ())
(() () () taco () taco ())
(() taco () taco () taco ())
(() () () () () taco ())
(() taco () () taco taco ())
(() () () () () taco ())
(() () () taco () taco ())
(taco () taco taco () taco ())
(() () () () () taco ())
(() () () taco taco taco taco)
(taco () () taco () taco ())
(taco () () taco () taco ())
(taco () () taco () taco ())
Hint: A char is a Racket datatype that represents one Unicode character.
Hint: read-char takes an input port as an argument and returns the next character in the port. So tokenize, in this case, returns a list of characters. "hello" would become (list #\h #\e #\l #\l #\o).
Hint: You can use char->integer and integer->char to convert to and from an integer value.
Hint: The sample notation relies on a certain fact about the numerical representation of ASCII characters.
Hint: Here’s some code to get you started:
#lang br/quicklang
(module+ reader
(provide read-syntax))
(define (tokenize ip)
(for/list ([tok (in-port read-char ip)])
tok))
(define (parse src toks)
;; ···
)
(define (read-syntax src ip)
(define toks (tokenize ip))
(define parse-tree (parse src toks))
(strip-bindings
(with-syntax ([PT parse-tree])
#'(module tacofied taco-compiler
PT))))
(define-macro (mb PT)
#'(#%module-begin
(for-each displayln 'PT)))
(provide (rename-out [mb #%module-begin]))
It’s tacos all the way down.