We’ll end this tutorial by adding support for command-line arguments. Let’s be clear: no one will ever write a shell script in BASIC. But it’ll give us a look at how Racket handles command-line arguments. We’ll also learn a few things about macro hygiene.
Best of all, we only have to update one module. Really.
In exports, we learned about the parameter called current-output-port, which controls where printed output goes; in the REPL, we learned about the current-read-interaction parameter that controls how the REPL parses its input.
Similarly, Racket makes its command-line arguments available through a parameter called current-command-line-arguments, which holds a vector containing the command line arguments that were passed during the current session. For instance, let’s make a new "args.rkt" module:
1 2 | #lang br (current-command-line-arguments) |
If we run this file in DrRacket, the vector of command-line arguments is empty—as we’d expect, since we’re not running it from the command line:
1 | '#() |
But if we go to the command line, we can ask racket to start with a certain module by using the -l flag and the module name. Subsequent arguments are treated as command-line arguments for that module:
1 | > racket -l basic/args "foo" 42 --bar "zam" |
This time, "args.rkt" prints the command-line arguments:
1 | '#("foo" "42" "--bar" "zam") |
Beyond that, what we do with these arguments is up to us. Racket’s command-line function, for instance, provides a high-level interface for parsing command-line arguments, including flags.
For BASIC, we won’t get that fancy. Instead, we’ll just make the first 10 command-line arguments available within our BASIC program under the special names arg0, arg1, ... arg9. That means we have two tasks:
Create the arg0, arg1, ... arg9 variables.
Set the value of each of these variables to the corresponding value read from the command line. (If there are fewer than 10 command-line arguments, we’ll set the remaining variables to 0.)
One small limitation is that we can’t match the number of arg* variables to the number of command-line arguments. We’ll be relying on syntax transformation to create the variables at compile time. But the actual command-line arguments won’t be available until run time. So the number of variables has to be fixed in advance. That said, 10 is arbitrary—we could make 25 or 100. The code would work the same way.
Once we do this, we should be able to use these variables like any other. If we make a sample "report-args.rkt" module (using basic-demo-3 just for illustration):
1 2 3 4 5 | #lang basic-demo-3 10 print "arg0 is " ; arg0 20 print "arg1 + arg1 is " ; arg1 + arg1 40 print "arg3 is " ; arg3 50 print "arg4 is " ; arg4 |
And invoke it from the command line like so:
1 | > racket -l basic/report-args "foo" 42 --bar "zam" |
We get this report of the command-line arguments. We omit arg2 just to show that we don’t have to use them all. Since there is no arg4 passed to the program, it behaves like an ordinary BASIC variable initialized to 0):
1 2 3 4 | arg0 is foo arg1 + arg1 is 84 arg3 is zam arg4 is 0 |
Seems straightforward. But it puts us on a collision course with an issue that confounds every writer of Racket macros at least once—how to circumvent macro hygiene.
So before we get to the code, a small detour.
Hygiene is the organizing policy of Racket’s macro system. It holds that identifiers introduced by a macro should get their bindings from the place where the macro was defined, not the place where the macro is used. The effect is to keep macro-created identifiers in a namespace separate from that of others. In Racket, these namespaces are called lexical contexts.
“If hygiene is so important, then why haven’t we been hearing more about it?” There’s been no need. The nice thing about hygiene is that it usually conforms to our intuitions about how macro-created code should work, and cuts down on housekeeping. (Hygiene is so named because of the “cleanliness” it enforces between lexical contexts.)
For instance, in our macro adventures so far, we haven’t had to worry about macro-introduced identifiers conflicting with others. For this, we can thank hygiene. In this example, we have a begin block that defines a variable x, and then a make-x macro that produces the same begin block with a different value for x. When we run both of these blocks in the same module, do the two x definitions conflict?
1 2 3 4 5 6 | (begin (define x 42) x) (define-macro (make-x) #'(begin (define x 'macro-id) x)) (make-x) |
No, they do not:
1 2 | 42 'macro-id |
What’s happening here? When Racket creates an identifier during compile time, it associates it with a particular lexical context. Later, when Racket needs to resolve the binding of that identifier, it relies on its lexical context. To Racket, the two x variables in the above example are completely different.
Under the hood, Racket attaches the lexical context to the syntax object that contains the identifier. Just like the other metadata attached to the syntax object—e.g., source location or syntax properties—lexical context is preserved as the identifer travels through the various macro transformations that happen during compile time.
Having multiple variables with the same name shouldn’t be alarming. Consider an analogous example: with let, we’re accustomed to introducing local bindings that are only valid within the body of the expression, that may shadow existing bindings. In this example, we have two definitions of a variable called z, but we know the local binding of z won’t “escape” its surrounding let and collide with the binding on the outside:
Hygiene is a different mechanism, but has a similar effect—creating lexical separation between variables.
So what happens if we remove hygiene? We can find out by changing our define-macro to define-unhygienic-macro, which will make our macro write its code into the existing lexical context, rather than keeping its code separate: + “You’re doing it wrong” is the lowest form of programming advice. But please don’t use define-unhygienic-macro in your own code. It exists only for demonstration purposes. Otherwise, it is a bringer of darkness and despair.
1 2 3 4 5 6 | (begin (define x 42) x) (define-unhygienic-macro (make-x) #'(begin (define x 'macro-id) x)) (make-x) |
This time, our two x variables will occupy the same lexical context, so the result is different:
1 | module: duplicate definition for identifier in: x |
This illustrates why hygiene is an essential ingredient in a composable—meaning, “plays nicely with others”—macro system. Once we make a macro, we want to be able to call it from any code, and have it behave the same way (and not break things). Without hygiene, we’d always have the risk that an identifier in the macro could collide with one already defined at the calling site. Hygiene guarantees this will never happen.
In most cases, hygiene makes writing macros easier. But occasionally it conflicts with legitimate goals, like using a macro as a substitute for define.
Here’s an example that trips up every Racketeer at least once (and usually more). Suppose a program repeatedly defines three variables:
Before long, a bright idea arises. “I know—I’ll make a macro!”
But then disappointment ensues:
1 | a: unbound identifier in module in: a |
What’s the problem here? Same as the problem with make-x—hygiene guarantees that the identifiers introduced by the macro remain in a separate lexical context. So a b and c are not accessible at the calling site after the macro is invoked.
In a situation like this, we have two choices:
We can rewrite the macro hygienically by passing the identifiers as arguments:
When we rewrite the macro this way, the identifiers a b and c are no longer being introduced by the macro. Rather, they’re being introduced on the outside, and passed to the macro as arguments. Therefore, rather than being siloed in the lexical context of the macro, they retain their lexical context from the calling site.
This is why we didn’t encounter any hygiene-related difficulties when we added variables to BASIC. All the identifiers for our variables came from the original source code. Even though our expander picked them up and moved them around, they retained their original lexical context.
Or, if we don’t want to pass the identifiers as arguments, we can manually create identifiers connected to the lexical context of the calling site. Because they circumvent hygiene, these are known as unhygienic identifiers:
1 2 3 4 5 6 7 8 9 10 | (define-macro (define-three-vars) (with-pattern ([ID1 (datum->syntax caller-stx 'a)] [ID2 (datum->syntax caller-stx 'b)] [ID3 (datum->syntax caller-stx 'c)]) #'(begin (define ID1 1) (define ID2 2) (define ID3 3)))) (define-three-vars) (+ a b c) ; 6 |
Here, we use datum->syntax to make each identifier, which takes a lexical context as its first argument, and a datum as the second. The result is a new syntax object associated with the given lexical context. + We first used datum->syntax when we made the reader for stacker. Back then, we used #f as the lexical context, because we deliberately wanted the resulting syntax object to have no bindings.
For the lexical-context argument, we pass a syntax object with the lexical context we want to “borrow”. In this case, it’s the macro variable caller-stx, which is the syntax object representing the original call to the macro, and thus carries the lexical context of the calling site.
In terms of Racket idiom, unhygienic identifiers occupy the same space as mutation or parameters: we avoid them when we can, because they go against the natural grain of the language. But when they’re the right tool for the job, we’re not finicky about using them.
For instance, when we want to create a new structure type, we use struct, which works in part by introducing unhygienic identifiers at the calling site (constructor, predicate, getters, setters):
For this reason, struct will fail if any of these unhygienic names conflict with existing identifiers at the calling site:
1 | module: duplicate definition for identifier in: thing? |
By the way, datum->syntax is just one option for creating unhygienic identifiers. Helper functions like prefix-id, suffix-id, and format-id can also create identifiers in a different lexical context. Furthermore, though it’s most common for unhygienic identifiers to be inserted in the lexical context of the macro calling site, we can put them—or any other syntax items—inside any lexical context.
Having completed our detour about hygiene, we’re ready to implement command-line arguments in BASIC.
Our task is to get the arguments out of current-command-line-arguments and assign them to the indexed variables arg0 through arg9.
We only need to update "expander.rkt", but in a few places.
In our begin-for-syntax block, we add make-shell-ids-and-idxs, a helper function for making our indexed identifiers:
This function takes one argument, ctxt, which is the lexical context where we want to place the identifiers. It will return a list of syntax objects, each of which contains an identifier and the index of the command-line argument it corresponds to. + We could change the arg-count value here to generate more than 10 arg* identifiers.
For later use, we refactor our find-property function to extract a helper function called unique-ids, which removes duplicate identifiers from a list:
1 2 3 4 5 6 7 8 | (define (unique-ids stxs) (remove-duplicates stxs #:key syntax->datum)) (define (find-property which line-stxs) (unique-ids (for/list ([stx (in-list (stx-flatten line-stxs))] #:when (syntax-property stx which)) stx))) |
In our b-module-begin macro, we make a list of SHELL-ID and SHELL-IDX values by calling make-shell-ids-and-idxs. We pass caller-stx as the lexical-context argument, since we want these new identifiers to behave as if they came from the calling site.
1 2 | [((SHELL-ID SHELL-IDX) ...) (make-shell-ids-and-idxs caller-stx)] |
If any of our indexed identifiers arg0 through arg9 already appear in the code, they’ll already be part of the VAR-ID ... list. We can’t define any variable twice. So we need to generate a list of unique identifiers. We do this by concatenating VAR-ID ... and SHELL-ID ... into one list and running this list through unique-ids.
1 2 3 | [(UNIQUE-ID ...) (unique-ids (syntax->list #'(VAR-ID ... SHELL-ID ...)))] |
After that, we just need to define each UNIQUE-ID:
We also set! every SHELL-ID to its corresponding command-line argument, using SHELL-IDX as the index into the vector:
1 2 | (let ([clargs (current-command-line-arguments)]) (set! SHELL-ID (get-clarg clargs SHELL-IDX)) ...) |
Because we prefer to minimize the amount of code inside the body of our macro, we move the get-clarg helper function to the outside:
1 2 3 4 5 | (define (get-clarg clargs idx) (if (<= (vector-length clargs) idx) 0 (let ([val (vector-ref clargs idx)]) (or (string->number val) val)))) |
The helper function takes two arguments: the vector of command-line arguments, and a vector index. If the idx exceeds the number of values available in clargs, we return 0. Otherwise, the command-line argument will be a string, or a string that represents a number. We try converting it with string->number, but if it returns #f, we return the argument itself.
When we put all the pieces together, our "expander.rkt" looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | #lang br/quicklang (require "struct.rkt" "run.rkt" "elements.rkt" "setup.rkt") (provide (rename-out [b-module-begin #%module-begin]) (all-from-out "elements.rkt")) (define-macro (b-module-begin (b-program LINE ...)) (with-pattern ([((b-line NUM STMT ...) ...) #'(LINE ...)] [(LINE-FUNC ...) (prefix-id "line-" #'(NUM ...))] [(VAR-ID ...) (find-property 'b-id #'(LINE ...))] [(IMPORT-NAME ...) (find-property 'b-import-name #'(LINE ...))] [(EXPORT-NAME ...) (find-property 'b-export-name #'(LINE ...))] [((SHELL-ID SHELL-IDX) ...) (make-shell-ids-and-idxs caller-stx)] [(UNIQUE-ID ...) (unique-ids (syntax->list #'(VAR-ID ... SHELL-ID ...)))]) #'(#%module-begin (module configure-runtime br (require basic/setup) (do-setup!)) (require IMPORT-NAME) ... (provide EXPORT-NAME ...) (define UNIQUE-ID 0) ... (let ([clargs (current-command-line-arguments)]) (set! SHELL-ID (get-clarg clargs SHELL-IDX)) ...) LINE ... (define line-table (apply hasheqv (append (list NUM LINE-FUNC) ...))) (parameterize ([current-output-port (basic-output-port)]) (void (run line-table)))))) (define (get-clarg clargs idx) (if (<= (vector-length clargs) idx) 0 (let ([val (vector-ref clargs idx)]) (or (string->number val) val)))) (begin-for-syntax (require racket/list) (define (unique-ids stxs) (remove-duplicates stxs #:key syntax->datum)) (define (find-property which line-stxs) (unique-ids (for/list ([stx (in-list (stx-flatten line-stxs))] #:when (syntax-property stx which)) stx))) (define (make-shell-ids-and-idxs ctxt) (define arg-count 10) (for/list ([idx (in-range arg-count)]) (list (suffix-id #'arg idx #:context ctxt) idx)))) |
We should now be able to run our "report-args.rkt" module under #lang basic:
1 2 3 4 5 | #lang basic 10 print "arg0 is " ; arg0 20 print "arg1 + arg1 is " ; arg1 + arg1 40 print "arg3 is " ; arg3 50 print "arg4 is " ; arg4 |
And invoke it from the command line like so:
1 | > racket -l basic/report-args "foo" 42 --bar "zam" |
We should see this report of the command-line arguments:
1 2 3 4 | arg0 is foo arg1 + arg1 is 84 arg3 is zam arg4 is 0 |
This shows that all the pieces are working. In particular, it shows that we’ve successfully created unhygienic identifiers that behave as if they were defined at the calling site (that is, in the BASIC source).
To appreciate the importance of breaking hygiene, we can try removing the #:context ctxt argument from the call to suffix-id. If we try invoking the same shell command:
1 | > racket -l basic/report-args "foo" 42 --bar "zam" |
This time, we get an error:
1 2 3 4 | set!: unbound identifier in module in: arg0 context...: standard-module-name-resolver |
This error arises because there are two mismatched arg0 identifiers in the code. One is attached to the lexical context of the BASIC source, and appears in the VAR-ID ... list. This one is used for the define. The other is attached to the lexical context of make-shell-ids-and-values, and appears in the SHELL-ID ... list. This one is used for the set!. Hence the error: the macro-introduced arg0 used with set! has not been defined yet.
One gotcha about working with unhygienic identifiers is that error messages can be ambiguous. For instance, the error above says that arg0 is unbound. But it doesn’t tell us which arg0 is the problem. In this case, we’re using set! only with SHELL-ID, so we can deduce the culprit. But in general, it’s another reason to use unhygieinic identifiers sparingly.