In the functional version of the bf expander, we have two goals:
Avoid keeping state—that is, variables that maintain a record of what’s happened in the program.
Avoid mutation—that is, using functions to change the value of state variables from afar.
What makes this tricky is that bf is, by its nature, an imperative language. Recall from how bf works that the language relies on two state values: a memory array of 30,000 bytes, and a pointer into that array.
In funstacker, we learned that we could approximate the behavior of state variables by turning them into accumulators within a for/fold loop. That’s a good start. But unlike funstacker, it won’t be enough. We need to step back and adjust our conceptual model of our bf implementation.
Before, we modeled our bf operations as triggering functions that mutated existing state variables. But these functions took no arguments, and returned no values.
This time, consistent with the functional-programming idiom, we’ll model our bf operations as functions that take the current array & pointer values as input, operate on them, and then return new array & pointer values as output. This output, in turn, becomes the input to the next function in line. In other words, rather than storing state values outside our functions, we’ll let the values travel through the functions.
By the way, is this approach guaranteed to give us a bf expander that’s shorter than the imperative expander? No. Faster? No. Morally superior? No. But the functional approach is more idiomatic to Racket. It will become a more vital skill as we advance through the tutorials, even if it’s not a huge benefit for bf.
First, let’s strip our "expander.rkt" module back to the bf-module-begin macro, which will be the same:
1 2 3 4 5 6 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) |
As before, we’ll need to write macros to handle bf-program, bf-op, and bf-loop nodes in our parse tree. But before we get to those, we’ll make a helper function that will be the engine of our interpreter.
Our goal is to allow our two bf state values—a memory array of 30,000 bytes, and a pointer into that array—to be passed from one bf operation to the next.
To do that, we’ll model these new bf operations as functions that take two input arguments—an array and a pointer—and return a new array and a new pointer. It’s possible for Racket functions to return multiple values (see functions). But we’ll keep things simple and package the new array and pointer inside a list, which will behave as a single return value. So the new functions implementing our bf operations will follow this pattern:
Next question: we want the return value of a bf-func to become the input arguments of the next-bf-func. But we seem to have an impedance mismatch: a bf-func only returns one argument, but next-bf-func needs two arguments as input.
1 2 | (define bf-func-result-list (bf-func arr ptr)) (next-bf-func bf-func-result-list) ; this won't work |
We can cure this mismatch with apply. The idea behind apply is simple: it takes a function and a list of values as input, and calls the function while using those values as input arguments. So these two expressions are always equivalent:
Our example stipulates that func is a run-time function—not a macro. In general, a macro can’t be passed as an argument to any higher-order function, including apply. For instance, and is a macro, not a function, so it doesn’t work with apply:
To go back to our pseudocode example, since bf-func-result-list is a list of arguments, we can use apply to pass them to the next-bf-func:
Now that we have apply, we can add a key helper function to our expander, called fold-funcs, that will be the engine of our bf interpreter:
1 2 3 4 5 6 7 8 9 10 11 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) (define (fold-funcs apl bf-funcs) (for/fold ([current-apl apl]) ([bf-func (in-list bf-funcs)]) (apply bf-func current-apl))) |
Our function relies on for/fold, which we learned about in funstacker as a way of approximating state values with accumulators. Our fold-funcs function takes two input arguments: apl, which is a list of a bf array and pointer—in other words, the return value of a bf-func—and a list of bf-funcs. + apl = abbreviation for “array–pointer list”. Unrelatedly, there’s a fantastic programming language from 1964 called APL that required its own special font. Decades of ASCII doldrums would follow. But Racket supports Unicode throughout, so it’s once again possible to use funny symbols in a programming language.
When the for/fold loop starts, it creates an accumulator called current-apl to hold the current state of the bf program, and initializes it to the apl argument passed as input. Then it iterates over the list of bf-funcs. On each pass through the loop, it uses apply to pass current-apl as arguments to the next bf-func.
As we remember from our previous use of for/fold, the return value of bf-func becomes the new current-apl value for the next pass. Once we run out of bf-funcs, the last value of current-apl becomes the return value of the for/fold loop, and therefore the whole fold-funcs function.
With fold-funcs in hand, we can now rewrite the macros from our imperative expander in a functional style. Nothing has changed in our parse tree. So our macros will be defined using the same syntax patterns we figured out before.
First we have bf-program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) (define (fold-funcs apl bf-funcs) (for/fold ([current-apl apl]) ([bf-func (in-list bf-funcs)]) (apply bf-func current-apl))) (define-macro (bf-program OP-OR-LOOP-ARG ...) #'(begin (define first-apl (list (make-vector 30000 0) 0)) (void (fold-funcs first-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-program) |
We want to return code for two expressions. But a syntax object can only represent one. In this case—or any time we want to return multiple expressions—we can consolidate them inside a begin. + Unlike let, begin does not create a new scope for variables—any variables defined inside a begin are visible outside as well.
Then we set up our initial program state in first-apl. As before, our bf array is a Racket vector, and the pointer is an integer, and we’ll combine them in a list. We’ll pass this first-apl to fold-funcs along with our list of OP-OR-LOOP-ARG ... as the list of bf-funcs. As before, this macro should not return a value, so we’ll pass the result of fold-funcs to void.
Next we have bf-loop, which was simple in our imperative expander, but needs a little more finesse here. We observe two things:
When a bf-loop arrives at fold-funcs, it will be expected to behave as a bf-func. So the return value of our bf-loop macro has to be a function that follows the pattern we established for every bf-func—two input arguments (an array and a pointer) and one return value (a list of a new array and pointer).
A bf-loop is basically a miniature bf program that runs repeatedly until a certain condition is met. So we can delegate the heavy lifting to fold-funcs.
The macro looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) (define (fold-funcs apl bf-funcs) (for/fold ([current-apl apl]) ([bf-func (in-list bf-funcs)]) (apply bf-func current-apl))) (define-macro (bf-program OP-OR-LOOP-ARG ...) #'(begin (define first-apl (list (make-vector 30000 0) 0)) (void (fold-funcs first-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-program) (define-macro (bf-loop "[" OP-OR-LOOP-ARG ... "]") #'(lambda (arr ptr) (for/fold ([current-apl (list arr ptr)]) ([i (in-naturals)] #:break (zero? (apply current-byte current-apl))) (fold-funcs current-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-loop) |
We can make an anonymous function with lambda, which works like define, except we omit the function name. + For more about lambda, see functions. A lambda expression can be assigned to a variable, or used anywhere that Racket expects a function:
In this case, our macro will return a lambda that runs another for/fold loop. As with fold-funcs, the loop will use an accumulator called current-apl that takes its initial state from the input arguments (combining arr and ptr).
We remember from how bf works that a bf-loop needs to continue indefinitely until the current byte is zero. So our for/fold loop will iterate using in-naturals, which produces an unending sequence of integers. Then we add a #:break condition, which ends the loop when the condition becomes true, to test whether the current byte is zero. (We haven’t made the current-byte function yet, but we will shortly.)
Finally, we have our new call to fold-funcs, which feeds in our current-apl and list of arguments. fold-funcs will return a list with a modified array and pointer, which becomes the value of current-apl for the next pass of our loop.
When the current byte becomes zero, this for/fold loop will end, and its return value will be the last value of current-apl. This, in turn, becomes the return value of the lambda function.
Duuuude. That was the most super-duper-Rackety macro we’ve written yet. Don’t panic if it takes another read for everything to sink in. The big point is that we’re seeing how we can pass an array / pointer list not just “horizontally” from function to function (= what we did within fold-funcs) but also “vertically” by calling fold-funcs again within a loop. This technique is called recursion. It’s a classic design pattern of functional programming, and ubiquitous in Racket.
In this case, the benefit of recursion is that we can totally avoid relying on external state variables and mutation. The other benefit is that we can now relax—it’s all downhill from here.
Our third macro is bf-op:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) (define (fold-funcs apl bf-funcs) (for/fold ([current-apl apl]) ([bf-func (in-list bf-funcs)]) (apply bf-func current-apl))) (define-macro (bf-program OP-OR-LOOP-ARG ...) #'(begin (define first-apl (list (make-vector 30000 0) 0)) (void (fold-funcs first-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-program) (define-macro (bf-loop "[" OP-OR-LOOP-ARG ... "]") #'(lambda (arr ptr) (for/fold ([current-apl (list arr ptr)]) ([i (in-naturals)] #:break (zero? (apply current-byte current-apl))) (fold-funcs current-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-loop) (define-macro-cases bf-op [(bf-op ">") #'gt] [(bf-op "<") #'lt] [(bf-op "+") #'plus] [(bf-op "-") #'minus] [(bf-op ".") #'period] [(bf-op ",") #'comma]) (provide bf-op) |
It looks similar to our previous bf-op macro, but rather than returning a self-contained function application like #'(gt), this version will return only the name of the corresponding function (so that fold-funcs can apply a list of arguments to it).
Now we’re ready to add the functions that implement the core bf array operations.
Our array-implementation functions will cover the same ground as before.
The current byte is the byte in the array at the location indicated by the pointer. So our current-byte function just gets the value from arr at the position ptr using vector-ref:
1 | (define (current-byte arr ptr) (vector-ref arr ptr)) |
Keeping with the functional idiom, our new set-current-byte will duplicate the input array with vector-copy, update the byte of interest with vector-set!, and then return this new array. In this case, vector-set! doesn’t count as mutation, because it’s just for internal housekeeping. What’s important is that we’re not using it to affect data outside the function. (Consistent with Racket convention, we remove the ! from the end of our function name, to signal that it returns a value rather than relying on mutation):
1 2 3 4 | (define (set-current-byte arr ptr val) (define new-arr (vector-copy arr)) (vector-set! new-arr ptr val) new-arr) |
Then we have the functions that implement specific bf operations. This time, rather than taking no arguments and returning no values, each function will take two arguments—an array and a pointer—and return a list with a possibly modified array and pointer. + Requirements for input and output values can be enforced less casually with contracts, a topic we’ll cover later.
The gt and lt functions—corresponding to the bf operations > and <—increase or decrease the pointer by one.
The plus and minus functions—corresponding to the bf operations + and -—increase or decrease the value of the current byte. We can express these operations in terms of current-byte and set-current-byte:
The period function, corresponding to a . in bf code, writes the current byte to standard output. As before, we use the Racket library function write-byte to help out. Then we just return the unmodified input values:
1 2 3 | (define (period arr ptr) (write-byte (current-byte arr ptr)) (list arr ptr)) |
Finally, the comma function, corresponding to a , in bf code, sets the current byte to one read from standard input. As before, we can do this with the Racket function read-byte:
Our finished expander should look like this, with the new array implementation highlighted:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #lang br/quicklang (define-macro (bf-module-begin PARSE-TREE) #'(#%module-begin PARSE-TREE)) (provide (rename-out [bf-module-begin #%module-begin])) (define (fold-funcs apl bf-funcs) (for/fold ([current-apl apl]) ([bf-func (in-list bf-funcs)]) (apply bf-func current-apl))) (define-macro (bf-program OP-OR-LOOP-ARG ...) #'(begin (define first-apl (list (make-vector 30000 0) 0)) (void (fold-funcs first-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-program) (define-macro (bf-loop "[" OP-OR-LOOP-ARG ... "]") #'(lambda (arr ptr) (for/fold ([current-apl (list arr ptr)]) ([i (in-naturals)] #:break (zero? (apply current-byte current-apl))) (fold-funcs current-apl (list OP-OR-LOOP-ARG ...))))) (provide bf-loop) (define-macro-cases bf-op [(bf-op ">") #'gt] [(bf-op "<") #'lt] [(bf-op "+") #'plus] [(bf-op "-") #'minus] [(bf-op ".") #'period] [(bf-op ",") #'comma]) (provide bf-op) (define (current-byte arr ptr) (vector-ref arr ptr)) (define (set-current-byte arr ptr val) (define new-arr (vector-copy arr)) (vector-set! new-arr ptr val) new-arr) (define (gt arr ptr) (list arr (add1 ptr))) (define (lt arr ptr) (list arr (sub1 ptr))) (define (plus arr ptr) (list (set-current-byte arr ptr (add1 (current-byte arr ptr))) ptr)) (define (minus arr ptr) (list (set-current-byte arr ptr (sub1 (current-byte arr ptr))) ptr)) (define (period arr ptr) (write-byte (current-byte arr ptr)) (list arr ptr)) (define (comma arr ptr) (list (set-current-byte arr ptr (read-byte)) ptr)) |
Let’s confirm that our test file "atsign.rkt" still works:
1 2 3 | #lang reader "reader.rkt" Greatest language ever! ++++-+++-++-++[>++++-+++-++-++<-]>. |
1 | @ |
Very good. How about our “Hello World” bf program?
1 2 3 4 5 6 | #lang reader "reader.rkt" ++++++[>++++++++++++<-]>. >++++++++++[>++++++++++<-]>+. +++++++..+++.>++++[>+++++++++++<-]>. <+++[>----<-]>.<<<<<+++[>+++++<-]>. >>.+++.------.--------.>>+. |
1 | Hello, World! |
Excellent. And the factorial generator—
1 2 3 4 5 6 7 | #lang reader "reader.rkt" >++++++++++>>>+>+[>>>+[-[<<<<<[+<<<<<]>>[[-]>[<<+>+>-] <[>+<-]<[>+<-[>+<-[>+<-[>+<-[>+<-[>+<-[>+<-[>+<-[>+<- [>[-]>>>>+>+<<<<<<-[>+<-]]]]]]]]]]]>[<+>-]+>>>>>]<<<<< [<<<<<]>>>>>>>[>>>>>]++[-<<<<<]>>>>>>-]+>>>>>]<[>++<-] <<<<[<[>+<-]<<<<]>>[->[-]++++++[<++++++++>-]>>>>]<<<<< [<[>+>+<<-]>.<<<<<]>.>>>>] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 1 1 2 6 24 120 720 5040 40320 362880 3628800 39916800 479001600 6227020800 87178291200 1307674368000 ··· |
Everything checks out. We’ve successfully converted our imperative-style expander into a functional-style expander. Again, in the case of bf it’s not necessarily the fastest approach. In fact, we probably noticed that "factorial.rkt" now runs much slower.
The main point was to show that even a language like bf that relies on state and mutation can be implemented without them, by relying on functional-programming techniques. As we move toward languages with more complex interactions, the cleanliness of the functional approach will pay dividends.
This will also be the last time we do an imperative-to-functional conversion. After this, we’ll just do everything in a functional style at the outset, avoiding state and mutation as much as possible (though keeping with our non-dogmatic policy, we won’t avoid it if it becomes necessary).
Alan Perlis once quipped that “A Lisp programmer knows the value of everything, but the cost of nothing.” Meaning, the pursuit of functional purity can negatively affect real-world performance. In this case, we eliminated state and mutation from our bf expander. But it got much slower.
Why? We could point out some small inefficiencies, but the major bugaboo is our implementation of set-current-byte:
1 2 3 4 | (define (set-current-byte arr ptr val) (define new-arr (vector-copy arr)) (vector-set! new-arr ptr val) new-arr) |
The problem here is that memory allocation isn’t free. Our input array arr is a large data structure—30,000 bytes—so vector-copy has to allocate nearly 30K every time it’s called. That takes time. Also, because the old arr is never used again, it gets marked for deletion by Racket’s garbage collector. Garbage collection happens automatically as the program runs. As discarded memory piles up, the garbage collector has to run more often. That takes time too.
What to do? The simplest fix would be to avoid unnecessary memory allocation. We can do that by adjusting set-current-byte to reuse our input data structure arr for the output:
1 2 3 | (define (set-current-byte arr ptr val) (vector-set! arr ptr val) arr) |
If we make this update and try "factorial.rkt" again, it runs much faster.
Can we live with this change? Sure. We’re not here to be rigid about functional programming. The fact that bf relies on a 30,000-byte array is unusual. And throwing out 29,999 bytes every time we change one is, in practical terms, wasteful and slow.
Moreover, because each arr passed to set-current-byte is never reused, this change has no effect on the rest of the program. In all, it’s a tiny compromise with a clear benefit. Ship it.