A syntax pattern is a tool for matching elements within a syntax object. Syntax patterns are used extensively in macros, especially for separating the input into named pieces so they can be rearranged. In that way, a syntax pattern does for a syntax object what a regular expression does for a string.
In #lang br, define-macro and define-macro-cases rely on syntax patterns to define calling patterns. In the macro m, the only matching input is the literal identifier foo:
1 2 3 | (define-macro (m foo) #'"match") (m foo) ; "match" (m bar) ; error: no matching case for pattern |
Whereas m2, defined with define-macro-cases, matches patterns of zero arguments, a literal foo identifier, or a literal foo followed by anything:
1 2 3 4 5 6 7 8 9 10 | (define-macro-cases m2 [(m2) "first"] [(m2 foo) "second"] [(m2 foo ARG) #'ARG] [else "no match"]) (m2) ; "first" (m2 foo) ; "second" (m2 foo "bar") ; "bar" (m2 bar) ; "no match" |
Syntax patterns cooperate closely with syntax templates. A syntax template is an expression that creates a syntax object. In a syntax template, internal references to pattern variables created by earlier syntax patterns are automatically replaced with their underlying matched value. Within a macro definition or with-pattern expression, any datum wrapped in a syntax expression (or, equivalently, prefixed with #') is treated as a syntax template.
Syntax patterns are often used throughout a macro to destructure syntax objects and syntax templates. For instance, this m3 macro contains three syntax patterns and three syntax templates:
1 2 3 4 5 6 | (define-macro (m3 MID ... LAST) (with-pattern ([(ONE TWO THREE) (syntax LAST)] [(ARG ...) #'(MID ...)]) #'(list ARG ... THREE TWO ONE))) (m3 25 42 ("foo" "bar" "zam")) ; '(25 42 "zam" "bar" "foo") |
(m3 MID ... LAST) is a syntax pattern that defines the possible input arguments to the macro, and matches them to pattern variables.
(syntax LAST) is a syntax template containing only the element matched by LAST. We could also write this as #'LAST. These elements are matched to another syntax pattern, (ONE TWO THREE).
#'(MID ...) is a syntax template containing the elements matched by MID .... We could also write this as (syntax (MID ...)). These elements are matched to the syntax pattern (ARG ...).
#'(list ARG ... THREE TWO ONE) is a syntax template containing the matched elements inside a list.
Of course, for a syntax pattern to produce a match, the input syntax has to conform to the pattern. For instance, the syntax pattern (ONE TWO THREE) needs LAST to be a list of three elements. If it isn’t, an error arises:
1 | (m3 25 42 ("foo" "bar")) |
1 | with-pattern: unable to match pattern (ONE TWO THREE) in: ("foo" "bar") |
A syntax pattern can have five possible ingredients:
A literal, which only matches itself. Numbers, strings, and symbols are always literals. In define-macro, define-macro-cases, and with-pattern, identifiers that are not in UPPERCASE are treated as literals. + In standard Racket, you need to list out literals separately (see, e.g., syntax-case). This is a chore that the br syntax functions handle automatically.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | (define-macro-cases num [(num 42) "match"] [else "nope"]) (num 42) ; "match" (num 24) ; "nope" (num "foo") ; "nope" (define-macro-cases str [(str "foo") "match"] [else "nope"]) (str "foo") ; "match" (str foo) ; "nope" (str "bar") ; "nope" (define-macro-cases sym [(sym 'foo) "match"] [else "nope"]) (sym 'foo) ; "match" (sym 'bar) ; "nope" (sym "foo") ; "nope" (define-macro-cases id [(id foo) "match"] [else "nope"]) (id foo) ; "match" (id bar) ; "nope" (id "foo") ; "nope" |
When matching literal identifiers, a trap awaits the unwary. A literal identifier in a syntax pattern is matched on the basis of its name but also its binding. + Racket jocks might know this as equality in the sense of free-identifier=?. In the submodule below, mac looks like it will match the literal identifier zeta. But outside the submodule, when we import zeta from math and then pass it as input to mac, we don’t get a match:
Why not? Because the identifier zeta has no binding at the macro-definition site. + And this, in turn, is because of macro hygiene: mac lives in a separate lexical context, and can’t see the zeta binding at the calling site. Even though the names of the literal identifiers are the same, their bindings are not. Therefore, the pattern doesn’t match, and the result is "nope".
We can make the two zeta identifiers match if we also make the same binding available at the macro-definition site, by importing math:
A pattern variable (or wildcard) which can match anything (including a list of things) and assigns the matched item a name. Once a pattern variable is defined, all appearances of the pattern variable within a syntax object are replaced with the matched value.
1 2 3 4 5 6 | (define-macro (self ARG) #'ARG) (self "foo") ; "foo" (self (list 1 2 3)) ; '(1 2 3) (define-macro (add-three ARG) #'(+ ARG ARG ARG)) (add-three 42) ; 126 |
The special wildcard _ also matches anything, but it can be used any number of times in a syntax pattern, and it cannot appear in a syntax template. It’s useful for signaling that an element of the syntax datum is being ignored.
1 2 3 | (define-macro (odds FIRST _ THIRD _ FIFTH) #'(list FIRST THIRD FIFTH)) (odds 1 2 3 4 5) ; '(1 3 5) |
A sublist pattern, which will only match elements arranged with the same parenthesization. If you know a certain element will be a list, a sublist pattern can be used to immediately match elements inside that list. Sublist patterns can be nested to any depth.
1 2 3 4 5 6 7 8 | (define-macro (m NUMS) (with-pattern ([(FIRST SECOND THIRD) #'NUMS]) #'(list THIRD SECOND FIRST))) (m (1 2 3)) ; '(3 2 1) (define-macro (m2 (FIRST SECOND THIRD)) #'(list THIRD SECOND FIRST)) (m2 (1 2 3)) ; '(3 2 1) |
By the way, a sublist pattern cannot create a sublist where none exists in the input:
1 2 3 | (define-macro (m2 (FIRST SECOND THIRD)) #'(list THIRD SECOND FIRST)) (m2 1 2 3) ; error: no match, because no sublist |
An ellipsis, which has to follow a pattern variable, and matches as many items as it can (similar to the “greedy” * operator in regular expressions).
1 2 3 4 5 6 | (define-macro (ellip ARG ...) #'(list ARG ...)) (ellip 1 2 3) ; '(1 2 3) (ellip "a" "b") ; '("a" "b") (ellip) ; '() |
Even though an ordinary pattern variable will match exactly one item, a pattern variable with an ellipsis can match zero items.
If a pattern variable has an ellipsis, when the variable appears in a syntax object, the ellipsis must also appear (otherwise it’s an error):
1 2 3 | (define-macro (bad-ellip ARG ...) #'(list ARG)) (bad-ellip 1 2 3) ; error: missing ellipsis |
You can only have one ellipsis in each sublist of the pattern, including the top level: + But see syntax/parse, an advanced macro-creation system that supports a richer pattern vocabulary.
A dot, which has to precede a pattern variable at the end of a list, and matches all remaining items in the list starting at the dot. The resulting pattern variable can either be used in a syntax object alone (in which case it’s treated as a list of items) or with another dot (in which case it’s spliced into the result). This explanation is more complicated than the example:
1 2 3 4 5 | (define-macro (m . ARGS) #'ARGS) (m + 1 2 3) ; means #'(+ 1 2 3) = 6 (define-macro (m2 . ARGS) #'(list . ARGS)) (m2 1 2 3) ; means #'(list 1 2 3) = '(1 2 3) |
“Isn’t a dot just a less flexible way of writing an ellipsis?” Pretty much. The above macros could be written with ellipses like so:
1 2 3 4 5 | (define-macro (m ARG ...) #'(ARG ...)) (m + 1 2 3) ; means #'(+ 1 2 3) = 6 (define-macro (m2 ARG ...) #'(list ARG ...)) (m2 1 2 3) ; means #'(list 1 2 3) = '(1 2 3) |
You can probably go your whole career as a macro writer without using the dot. But if you come across it in someone else’s source code, you’ll know what it means. (BTW, you can’t use a dot and ellipsis in the same level of a pattern.)
As a final exam, let’s combine all our vocabulary elements into a single pattern. Hopefully the result is not surprising:
As with regular expressions, you can often choose among multiple syntax patterns to match a single syntax object. Which you choose is a question of how strict you want to be about the match, and how you want to manipulate the pieces thereafter.
For instance, these are all valid ways to match (and then reassemble) the list (+ 1 2):
1 2 3 4 5 6 7 8 9 10 | (define-macro (m1 ARGS) #'ARGS) (m1 (+ 1 2)) ; 3 (define-macro (m2 (ARG ...)) #'(ARG ...)) (m2 (+ 1 2)) ; 3 (define-macro (m3 (1ST 2ND 3RD)) #'(1ST 2ND 3RD)) (m3 (+ 1 2)) ; 3 (define-macro (m4 (1ST REST ...)) #'(1ST REST ...)) (m4 (+ 1 2)) ; 3 (define-macro (m5 (1ST . TAIL)) #'(1ST . TAIL)) (m5 (+ 1 2)) ; 3 |
You cannot make a syntax pattern that captures arguments two at a time (or three at a time, etc.).
You cannot take two ellipsized pattern variables and interleave their values within a syntax template.
You cannot make matches optional, in the sense of “match zero or one occurrence of this item”. An ellipsis can approximate an optional argument, because it will match zero occurrences, but it will also match more than one:
1 2 3 4 5 6 7 8 | (define-macro-cases m [(m REQ OPT ...) #'(list REQ OPT ...)] [else "nope"]) (m) ; "nope" (m 1) ; '(1) (m 1 2) ; '(1 2) (m 1 2 3) ; '(1 2 3) |
If these shortcomings bum you out, the syntax/parse library supports a richer vocabulary of syntax patterns.
Syntax patterns in the Racket Reference