First, when we say “syntax coloring” it has nothing to do with syntax in the sense of a syntax object or syntax pattern. In this case, “syntax” just refers to the displayed source code. “Source-code coloring” would probably have been the more accurate term. But “syntax coloring” is what Racket calls it.
DrRacket uses a syntax colorer to handle the coloring for a language. Unless told otherwise, DrRacket uses the default Racket colorer. But we can also optionally write a custom colorer for our language, attaching it with the get-info function we just learned about in DrRacket integration.
In addition to coloring, DrRacket also gets information from the syntax colorer about the location of delimiters. It uses this information to support certain GUI conveniences, such as selecting a whole delimited expression when we click at its boundary.
Right now, the syntax coloring for jsonic is mediocre. Here’s our current test file as it appears in DrRacket:
Because DrRacket uses the Racket colorer by default, the embedded Racket expressions are colored accurately. But not the jsonic syntax:
The line comment is not colored as a comment.
The square brackets are colored like Racket parentheses, rather than strings (that are part of the larger JSON string represented by the program).
The @$ and $@ delimiters are not colored as delimiters, but as identifiers.
Not only are the delimiters colored incorrectly, but they also don’t behave as delimiters in the GUI. For instance, if we click in front of the (* 6 7) expression, DrRacket will select the whole expression:
Whereas if we click in front of the jsonic delimiter to the left, nothing happens:
Ideally, this click would highlight the whole embedded expression, from the @$ on the left to the matching $@ on the right.
So let’s do better. We can write a custom syntax colorer for jsonic that will contain more precise instructions for how DrRacket should color jsonic source code.
Our syntax colorer will work much like our tokenizer. With the tokenizer, Racket gave us a port containing the source code, and we used a lexer to identify the smallest meaningful chunks.
In this case, DrRacket will give our coloring function a new port containing the source code (plus a couple more arguments to track the current coloring state). But this time, instead of using a lexer to create tokens, we’ll use a lexer to associate each part of the code with a coloring annotation. DrRacket will rely on these annotations to color the code. + It’s possible to use one lexer for both roles. And we’ll do it that way in a later tutorial. But for now, it’s simpler to use two.
In most cases, our syntax-coloring annotations will contain five values:
The matched string to be colored (or an eof-object? that signals we’ve reached the end). + Because DrRacket relies primarily on the source-location fields for syntax coloring, this field isn’t strictly mandatory. If the source locations are accurate, the syntax will be colored correctly.
A coloring category, which can be one of 'error, 'comment, 'sexp-comment, 'white-space, 'constant, 'string, 'no-color, 'parenthesis, 'hash-colon-keyword, 'symbol, 'eof, or 'other. Rather than selecting a particular color, we identify the code with a category. Then, for each category, the user’s color-theme preference in DrRacket determines the actual color displayed.
A parenthesis shape, which can be one of ()[]{} or #f. Though the 'parenthesis category sets the color, DrRacket uses the specific parenthesis shape to handle other conveniences around matching parentheses & coloring whole expressions. More broadly, we can use this field with any pair of delimiters—even if they’re not literally parentheses—when we want them to be treated like parentheses in DrRacket.
The source location of the matched string, specifically its starting position ...
... and its ending position.
In special cases where we want to track the state of the colorer or lexer—as we do with jsonic, because we’re going to be alternating between two syntax colorers—each syntax-coloring annotation contains two supplementary values:
A backup distance, signaling the maximum number of characters to back up and recolor after a nearby edit. When in doubt, we can let this be 0.
A coloring mode. In cases where the colorer can be in multiple states, this value sets the current state (which can be any value). Each time DrRacket calls the coloring function, it passes the coloring mode back to the colorer as an argument.
For more about the colorer interface, see the docs for start-colorer.
By the way, coloring annotations are strictly cosmetic. They don’t change how the language works. If we goof them up, nothing bad will happen.
Our goal for our syntax colorer is to map the syntactic elements of our language onto the coloring categories available in DrRacket. How deep we go is up to us. For instance, it wouldn’t be wrong to make a syntax colorer that just annotates all the code with the 'string coloring category.
But that would be boring. So we’ll make a slightly more ambitious coloring plan for jsonic:
Each line comment will be colored as a 'comment.
Each expression delimiter—@$ or $@—will be colored as a 'parenthesis. Furthermore, we’ll set the parenthesis shape of the delimiters to be ( and ) respectively, so that DrRacket treats these delimiters as a matched pair.
The Racket code between the expression delimiters will be colored according to standard Racket rules. We’ll accomplish this by importing Racket’s syntax colorer and switching to that after the opening delimiter, and switching back to our jsonic lexer at the closing delimiter. That’s the fancy sauce.
Everything else will be colored as a 'string.
Our syntax coloring will be handled by the color-jsonic function in "jsonic/colorer". Let’s start that module now:
As we did in the tokenizer, we import brag/support to get lexer and other helper functions. We also import syntax-color/racket-lexer to get racket-lexer, Racket’s default syntax colorer, which we’ll use later.
We have two functions in this module:
jsonic-lexer, which handles the coloring annotations when the colorer isn’t using the Racket lexer. Each of these coloring annotations will contain the five essential coloring-annotation values.
color-jsonic, the main function called by DrRacket, which will toggle between the two lexers. It will also append the two supplementary values for each coloring annotation.
First, let’s write the coloring rules in jsonic-lexer. As before, we always need a rule that handles the eof signal that our port will emit when it reaches the end of the file. Previously, we relied on the lexer’s default behavior to handle eof. This time, because we need a special return value, we write an explicit eof rule:
Each of these lexer rules needs to return a multiple-valued coloring annotation. To do this, we use values, which is a way of returning multiple values from a function, and how DrRacket expects the annotations to be packaged. + For more on functions returning multiple values, see functions. Following our specification above, the annotation has five values: the lexeme, the coloring category 'eof, and then the parenthesis shape and two source locations are all #f.
Next we handle our delimiters:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #lang br (require brag/support syntax-color/racket-lexer) (define jsonic-lexer (lexer [(eof) (values lexeme 'eof #f #f #f)] [(:or "@$" "$@") (values lexeme 'parenthesis (if (equal? lexeme "@$") '|(| '|)|) (pos lexeme-start) (pos lexeme-end))])) (define (color-jsonic port offset racket-coloring-mode?) ···) (provide color-jsonic) |
We assign both delimiters to the 'parenthesis coloring category. For the opening delimiter, we use a parenthesis shape of '|(|, and for the closing delimiter, '|)| (Don’t be alarmed by the funny notation—because of its special role in the Racket language, we have to escape a parenthesis within vertical bars when we use it as a symbol.) As we learned before, our start and end positions can be found in (pos lexeme-start) and (pos lexeme-end).
Our line comment is straightforward—we can reuse the comment-matching rule from our tokenizer and assign it to the 'comment category. And as before, we use an any-char rule to catch everything else, assigning it to the 'string category:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #lang br (require brag/support syntax-color/racket-lexer) (define jsonic-lexer (lexer [(eof) (values lexeme 'eof #f #f #f)] [(:or "@$" "$@") (values lexeme 'parenthesis (if (equal? lexeme "@$") '|(| '|)|) (pos lexeme-start) (pos lexeme-end))] [(from/to "//" "\n") (values lexeme 'comment #f (pos lexeme-start) (pos lexeme-end))] [any-char (values lexeme 'string #f (pos lexeme-start) (pos lexeme-end))])) (define (color-jsonic port offset racket-coloring-mode?) ···) (provide color-jsonic) |
With the lexer in place, we can write color-jsonic, which keeps track of whether we should be using the racket-lexer or the jsonic-lexer.
color-jsonic gets three arguments from DrRacket: an input port containing the source code, an offset value (that we ignore for now) and whatever value has been set as the current coloring mode, which for us will be a Boolean indicating the racket-coloring-mode?. When the colorer starts, this argument defaults to #f. We use a cond expression to determine which lexer to use:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #lang br (require brag/support syntax-color/racket-lexer) (define jsonic-lexer ···) (define (color-jsonic port offset racket-coloring-mode?) (cond [(or (not racket-coloring-mode?) (equal? (peek-string 2 0 port) "$@")) (define-values (str cat paren start end) (jsonic-lexer port)) (define switch-to-racket-mode (equal? str "@$")) (values str cat paren start end 0 switch-to-racket-mode)] [else ···])) (provide color-jsonic) |
The first branch of the cond invokes the jsonic-lexer in two situations:
When racket-coloring-mode? is #f. Obviously.
When racket-coloring-mode? is #t, but we’re about to encounter a closing delimiter $@, and thus need to exit Racket mode. The problem with relying on the default racket-lexer is that it doesn’t know anything about our special closing delimiter $@. So when it reaches that closing delimiter, it won’t stop lexing.
We fix this problem with peek-string. This lets us add a lookahead condition that checks to see if the next two-character string available from port will be a closing delimiter. If it is, we know that our embedded Racket expression is done, so we should switch back to our jsonic-lexer. But because we’re “peeking” into the port rather than “reading”, the closing delimiter itself is left in the port to be handled on the next pass of the lexer.
Once we get the five-value coloring annotation from jsonic-lexer, we append two more values. First, a 0 for the backup distance (because we’re not using backup distance in this project, but we have to pass a default value). Second, a value that indicates whether we’re switching to Racket mode. The test is simple: if the str value returned from jsonic-lexer is our opening delimiter @$, we switch, because that means we’re entering an embedded Racket expression. On the next call to color-jsonic, this value will become the value passed as the racket-coloring-mode? argument.
The Racket-mode branch is simpler:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #lang br (require brag/support syntax-color/racket-lexer) (define jsonic-lexer ···) (define (color-jsonic port offset racket-coloring-mode?) (cond [(or (not racket-coloring-mode?) (equal? (peek-string 2 0 port) "$@")) (define-values (str cat paren start end) (jsonic-lexer port)) (define switch-to-racket-mode (equal? str "@$")) (values str cat paren start end 0 switch-to-racket-mode)] [else (define-values (str cat paren start end) (racket-lexer port)) (values str cat paren start end 0 #t)])) (provide color-jsonic) |
Here, we just get the five basic color-annotation values from calling racket-lexer, append another default backup distance of 0, and then #t to indicate that we should stay in Racket coloring mode.
A visual check isn’t a substitute for unit tests. But we’ll get to those in a minute. For now, let’s see if our syntax coloring does what we expect. Recall what we started with:
To see our colorer in action, we have to connect it to our get-info function in "main.rkt". We do that by uncommenting the branch we previously commented out:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #lang br/quicklang (module reader br (require "reader.rkt") (provide read-syntax get-info) (define (get-info port src-mod src-line src-col src-pos) (define (handle-query key default) (case key [(color-lexer) (dynamic-require 'jsonic/colorer 'color-jsonic)] #;[(drracket:indentation) (dynamic-require 'jsonic/indenter 'indent-jsonic)] #;[(drracket:toolbar-buttons) (dynamic-require 'jsonic/buttons 'button-list)] [else default])) handle-query)) |
Now we can reopen our "jsonic-test.rkt" file in DrRacket. For faster performance, DrRacket caches the result of get-info for each language. We have to force a refresh. If we’re using Racket v6.9 or later, we select Racket → Reload #lang Extensions, which reloads our get-info function and our new colorer. If not, we quit and restart DrRacket, which has the same effect. After a moment, the code will look like this:
We can verify that we’ve accomplished the syntax-coloring improvements we wanted:
The line comment is now colored as a comment.
The square brackets are colored as strings.
The @$ and $@ delimiters are colored as delimiters, not as identifiers.
Let’s also check that our jsonic delimiters are recognized as delimiters in the GUI. If we click in front of the (* 6 7) expression again, DrRacket still selects the whole Racket expression:
But now, if we click in front of the jsonic delimiter to the left, DrRacket will select the whole expression, from the @$ on the left to the matching $@ on the right:
DrRacket is just following our instructions from the syntax colorer, where we annotated the @$ and $@ with the 'parenthesis coloring class and the appropriate parenthesis shape.
In the real world, we would’ve fitted our function with contracts and unit tests before we wrote the middle parts. Here in tutorial world, better to cover the new material first.
But we’ve done that. So let’s now write a contract for color-jsonic. For input, we know it takes an input-port?, an offset value, which is an exact-nonnegative-integer?, and a mode value, which for us is a boolean?:
1 2 | (input-port? exact-nonnegative-integer? boolean? . -> . ···) |
What about a contract for the return value: the color annotation? Each color annotation has seven values inside. So we can think about this as seven smaller contracts packaged inside a values contract combinator (just as we used values to package the individual values):
1 2 | (input-port? exact-nonnegative-integer? boolean? . -> . (values ···)) |
As we learned at the top of the page, these seven values are as follows: a lexeme (which is a string? or an eof-object?), a coloring category (symbol?), a parenthesis shape (symbol? or #f), a starting and ending source position (both either exact-positive-integer? or #f), a backup value (another exact-nonnegative-integer?) and a boolean? mode. So the whole contract looks like this:
1 2 3 4 5 6 7 8 9 |
In our code, we add racket/contract to our imports and add our contract to our provide expression using contract-out, so our module so far looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | #lang br (require brag/support syntax-color/racket-lexer racket/contract) (define jsonic-lexer (lexer [(eof) (values lexeme 'eof #f #f #f)] [(:or "@$" "$@") (values lexeme 'parenthesis (if (equal? lexeme "@$") '|(| '|)|) (pos lexeme-start) (pos lexeme-end))] [(from/to "//" "\n") (values lexeme 'comment #f (pos lexeme-start) (pos lexeme-end))] [any-char (values lexeme 'string #f (pos lexeme-start) (pos lexeme-end))])) (define (color-jsonic port offset racket-coloring-mode?) (cond [(or (not racket-coloring-mode?) (equal? (peek-string 2 0 port) "$@")) (define-values (str cat paren start end) (jsonic-lexer port)) (define switch-to-racket-mode (equal? str "@$")) (values str cat paren start end 0 switch-to-racket-mode)] [else (define-values (str cat paren start end) (racket-lexer port)) (values str cat paren start end 0 #t)])) (provide (contract-out [color-jsonic (input-port? exact-nonnegative-integer? boolean? . -> . (values (or/c string? eof-object?) symbol? (or/c symbol? #f) (or/c exact-positive-integer? #f) (or/c exact-positive-integer? #f) exact-nonnegative-integer? boolean?))])) |
Let’s consider a simple test case, where we use open-input-string to convert the string "x" into an input port that can be passed to color-jsonic, along with a default offset argument of 0 and a default mode argument of #f:
1 | (color-jsonic (open-input-string "x") 0 #f) |
What result do we expect? The matched string should be "x", the coloring category should be 'string, the parenthesis shape is #f, the starting and ending positions are 1 and 2, the backup distance is 0, and the mode remains #f. If we enter the above expression at the REPL, we see that this is so:
1 2 3 4 5 6 7 | "x" 'string #f 1 2 0 #f |
Now we want to encapsulate this in an automated rackunit test. Testing a function that returns multiple values requires a little special handling. We saw how rackunit provides functions like check-equal? to test the return value of an expression. But these functions only take one value as input. They won’t work here, because color-jsonic returns seven values:
1 2 3 4 | (module+ test (require rackunit) (check-equal? (color-jsonic (open-input-string "x") 0 #f) 42)) |
1 2 3 4 5 | result arity mismatch; expected number of values not received expected: 1 received: 7 values...: |
We wrap our test expression in values->list, which will repackage our five return values as a five-element list (which only counts as one return value). We can then check this list against a list containing our expected results:
1 2 3 4 5 | (module+ test (require rackunit) (check-equal? (values->list (color-jsonic (open-input-string "x") 0 #f)) (list "x" 'string #f 1 2 0 #f))) |
Our "jsonic/colorer.rkt" ends up like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | #lang br (require brag/support syntax-color/racket-lexer racket/contract) (define jsonic-lexer (lexer [(eof) (values lexeme 'eof #f #f #f)] [(:or "@$" "$@") (values lexeme 'parenthesis (if (equal? lexeme "@$") '|(| '|)|) (pos lexeme-start) (pos lexeme-end))] [(from/to "//" "\n") (values lexeme 'comment #f (pos lexeme-start) (pos lexeme-end))] [any-char (values lexeme 'string #f (pos lexeme-start) (pos lexeme-end))])) (define (color-jsonic port offset racket-coloring-mode?) (cond [(or (not racket-coloring-mode?) (equal? (peek-string 2 0 port) "$@")) (define-values (str cat paren start end) (jsonic-lexer port)) (define switch-to-racket-mode (equal? str "@$")) (values str cat paren start end 0 switch-to-racket-mode)] [else (define-values (str cat paren start end) (racket-lexer port)) (values str cat paren start end 0 #t)])) (provide (contract-out [color-jsonic (input-port? exact-nonnegative-integer? boolean? . -> . (values (or/c string? eof-object?) symbol? (or/c symbol? #f) (or/c exact-positive-integer? #f) (or/c exact-positive-integer? #f) exact-nonnegative-integer? boolean?))])) (module+ test (require rackunit) (check-equal? (values->list (color-jsonic (open-input-string "x") 0 #f)) (list "x" 'string #f 1 2 0 #f))) |
Of course, in practice we’d want to write (a lot) more than one unit test. But this suffices to introduce the basic pattern for how we’d write tests for color-jsonic.