Hygiene is the organizing policy of Racket’s macro system. Understanding hygiene is the key to understanding how macros work, and by extension, how to write good macros of your own. Conversely, not understanding hygiene can make Racket macros seem confusing and mysterious. But they are not.
Hygiene is the answer to a simple question: a macro generates code that gets deposited elsewhere. When that code is evaluated, how should we determine the bindings of the identifiers inside? + Because a Racket program can’t run unless every identifier has a binding. See identifiers.
We have two choices:
Determine the bindings according to what the identifiers mean at the place where the macro was defined (aka the definition site). This is Racket’s default policy, and part of what is implied by hygiene.
Determine the bindings according to what the identifiers mean at the place where the macro was invoked (aka the calling site). Sometimes this is the behavior we want, so Racket lets us “break hygiene” when we need to.
To determine the binding of an identifier, we look inside the lexical context attached to the code, which holds all the available bindings. So hygiene can be seen as a way of managing lexical contexts. In the example below, there are two identifiers named x. But thanks to hygiene, they live in separate lexical contexts, so they don’t collide: + In Racket, begin groups expressions without creating a new lexical context (unlike let).
Without hygiene, every identifier would be treated as if it lived in the lexical context of the calling site. While this has an appealing simplicity, it quickly runs aground on the shoals of practicality.
Let’s convert the example above into an unhygienic macro, so it invades the lexical context of the calling site. It won’t work, because the two identifiers named x now collide:
1 2 3 4 5 6 | (define x 42) (define-unhygienic-macro (mac) #'(begin (define x 84) (println x))) (mac) ; error: duplicate definition |
“No problem—just change the macro to use an identifier with a distinct name.” But how? The macro doesn’t know anything about where it will be called, or what identifiers are already being used there. So there’s no way to guarantee that any particular identifier will be distinct. Hygiene, by contrast, keeps lexical contexts separate, thereby eliminating the possibility of inadvertent name collisions (also known as identifier capture).
The problem can run in the opposite direction as well. In this example, our mac and u-mac macros implicitly rely on println having its usual meaning. But it might not:
1 2 3 4 5 6 7 8 | (define-macro (mac) #'(println "mac runs")) (define-unhygienic-macro (u-mac) #'(println "u-mac runs")) (mac) (u-mac) (module+ main (define (println thing) (error 'zombie-apocalypse-started)) (mac) (u-mac)) |
1 2 3 4 | "mac runs" "u-mac runs" "mac runs" error: zombie-apocalypse-started |
Within the main module, println has been redefined. At that point, the behavior of the hygienic macro remains consistent: thanks to hygiene, it retains the original binding of println. Whereas the unhygienic macro becomes unpredictable.
More broadly, hygiene can be seen as a way of making macros more composable, which is a general design goal of functional languages. Macros can’t know anything about the bindings used at the calling site. Therefore, the best way to ensure consistent behavior is to make macro-generated code as self-contained as possible.
For the macro writer, there are four golden rules of hygiene:
Code produced by a macro adopts the lexical context of the macro-definition site. Therefore, this code can only rely on identifiers that have bindings at the definition site. Below, the mac macro can create code that refers to x, because x has a binding at the macro-definition site (albeit outside the macro):
1 2 3 4 | (define x 42) (define-macro (mac) #'(println x)) (mac) ; 42 |
Within code produced by the macro, new bindings can shadow existing bindings at the definition site (similar to how let works). For example, this updated mac macro defines its own x, which overrides the x defined outside the macro, so the result is now 84:
1 2 3 4 5 6 | (define x 42) (define-macro (mac) #'(begin (define x 84) (println x))) (mac) ; 84 |
Bindings introduced by a macro are only visible to other code produced by that macro. In the example below, one x is defined outside the mac macro; another is defined inside. The (println x) generated by the macro refers to the x defined by the macro. The (println x) outside the macro refers to the x defined outside the macro. So we end up with two identifiers called x with different values:
The corollary to this rule is that bindings introduced by a macro are not visible outside the macro. Every early-stage Racketeer tries to write a variation of the define-x macro below, and is flummoxed by the error:
1 2 3 4 | (define-macro (define-x) #'(define x 42)) (define-x) (println x) ; error: unbound identifier |
But it’s not surprising: the x defined inside the macro lives inside a separate lexical context, so the (println x) outside the macro can’t see it. If this still annoys you, consider an analogous example with let, which won’t surprise anyone.
Every identifier retains its binding from its original lexical context. + This is accomplished by using syntax objects throughout the macro system. Here, we pass the outer x to our macro as an argument and assign it to the pattern variable OUTER-X. When OUTER-X appears in our macro code, it still refers to the x defined outside the macro, while the other x refers to the one defined inside the macro:
1 2 | 84 42 |
Sometimes a macro needs to introduce identifiers into the lexical context of the calling site. These identifiers are called unhygienic. Despite the shameful-sounding name, there’s nothing wrong with unhygienic identifiers. On the contrary, sometimes they’re the right tool for the job. But it’s wise to reserve them until needed.
Macros rely on hygiene by default. Therefore, breaking hygiene requires us to explicitly inject our new identifier into the target lexical context.
A common use for unhygienic identifiers are macros that extend define by modifying the given identifier name. + For instance, struct unhygienically creates setter & getter identifiers for its fields. See data structures. As an example, let’s make define-$, which works like define for functions except that it appends $ to the name:
1 2 3 4 5 6 7 | (define-macro (define-$ (ID ARG ...) BODY ...) (define id$-datum (format-datum '~a$ (syntax->datum #'ID))) (with-pattern ([ID$ (datum->syntax #'ID id$-datum)]) #'(define (ID$ ARG ...) BODY ...))) (define-$ (f x) (* x x)) (f$ 5) ; 25 |
The macro starts with a syntax pattern that matches our ID, ARGs, and BODY expressions. We extract the datum from #'ID with syntax->datum. We use format-datum to make a new datum with a $ appended, called id$-datum.
The unhygienic identifier is created next, with datum->syntax. The first argument of datum->syntax is the lexical context for the identifier being created. In this case, we use #'ID because it came from the calling site, and therefore has the lexical context we want to borrow. The second argument is our datum. We match this new identifier to the pattern variable ID$, so we can use it in the syntax template below. Within the template, we use ID$ in the name position of a standard define form.
The result is that we can then use the variable f$ as if we had defined it directly. (The identifier f remains unbound.)
There’s no one right way to create an unhygienic identifier. The mechanics always remain the same. For instance, if the syntax->datum / datum->syntax fandango gets tiresome, you can use prefix-id and suffix-id as helper functions to simplify creation of unhygienic identifiers. They create a new identifier derived from the name and lexical context of an existing identifier:
1 2 3 4 5 6 | (define-macro (define-$ (ID ARG ...) BODY ...) (with-pattern ([ID$ (suffix-id #'ID '$)]) #'(define (ID$ ARG ...) BODY ...))) (define-$ (f x) (* x x)) (f$ 5) ; 25 |
Lexical scope in the Racket Guide
Macro-introduced bindings in the Racket Reference