Beautiful Racket: Into the rapids: more basic

Beautiful Racket / tutorials

Not everything is wrong—the strings, numbers, and identifiers look right, because they happen to be written the same way in Racket. But the rem comments in lines 30 and 70 aren’t colored as comments. The value 'three' on line 60 is not colored as a string. And on line 10, the part of the line after the ; is formatted as a comment—because it would be, if this were Racket code—but in BASIC, it just represents extra arguments to print, and these should be colored like any other values.

basic/lexer.rkt

#lang br
(require brag/support)

(define-lex-abbrev digits (:+ (char-set "0123456789")))

(define basic-lexer
  (lexer-srcloc
   ["\n" (token 'NEWLINE lexeme)]
   [whitespace (token lexeme #:skip? #t)]
   [(from/stop-before "rem" "\n") (token 'REM lexeme)]
   [(:or "print" "goto" "end"
         "+" ":" ";") (token lexeme lexeme)]
   [digits (token 'INTEGER (string->number lexeme))]
   [(:or (:seq (:? digits) "." digits)
         (:seq digits "."))
    (token 'DECIMAL (string->number lexeme))]
   [(:or (from/to "\"" "\"") (from/to "'" "'"))
    (token 'STRING
           (substring lexeme
                      1 (sub1 (string-length lexeme))))]))

(provide basic-lexer)
#lang br
(require brag/support)

(define-lex-abbrev digits (:+ (char-set "0123456789")))

(define basic-lexer
  (lexer-srcloc
   ["\n" (token 'NEWLINE lexeme)]
   [whitespace (token lexeme #:skip? #t)]
   [(from/stop-before "rem" "\n") (token 'REM lexeme)]
   [(:or "print" "goto" "end"
         "+" ":" ";") (token lexeme lexeme)]
   [digits (token 'INTEGER (string->number lexeme))]
   [(:or (:seq (:? digits) "." digits)
         (:seq digits "."))
    (token 'DECIMAL (string->number lexeme))]
   [(:or (from/to "\"" "\"") (from/to "'" "'"))
    (token 'STRING
           (substring lexeme
                      1 (sub1 (string-length lexeme))))]))

(provide basic-lexer)

Here’s the plan. We’ll read tokens from the lexer, each of which will be an instance of a srcloc-token structure. We’ll return one syntax-coloring annotation for each token. To make each annotation, we’ll read values from inside the token and use them to figure out what coloring category applies and what parenthesis shape we should use (if any). We’ll also read the necessary source-location values from inside the token, and put those into the coloring annotation.

basic/main.rkt

#lang br/quicklang
(require "parser.rkt" "tokenizer.rkt")

(module+ reader
  (provide read-syntax get-info))

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port path)))
  (strip-bindings
   #`(module basic-mod basic/expander
       #,parse-tree)))

(define (get-info port src-mod src-line src-col src-pos)
    (define (handle-query key default)
      (case key
        [(color-lexer)
         (dynamic-require 'basic/colorer 'basic-colorer)]
        [else default]))
    handle-query)
#lang br/quicklang
(require "parser.rkt" "tokenizer.rkt")

(module+ reader
  (provide read-syntax get-info))

(define (read-syntax path port)
  (define parse-tree (parse path (make-tokenizer port path)))
  (strip-bindings
   #`(module basic-mod basic/expander
       #,parse-tree)))

(define (get-info port src-mod src-line src-col src-pos)
    (define (handle-query key default)
      (case key
        [(color-lexer)
         (dynamic-require 'basic/colorer 'basic-colorer)]
        [else default]))
    handle-query)

basic/colorer.rkt

#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define srcloc-tok (basic-lexer port))
  ···)
#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define srcloc-tok (basic-lexer port))
  ···)

basic/colorer.rkt

#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  ···)
#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  ···)

(struct thing (x y))

(define (m in)
  (match in
    ["foo" 'got-foo]          ; literal match
    [(? number?) 'got-number] ; predicate match
    [(list a b c) (list b)]   ; list match + assignment
    [(thing i j) (+ i j)]     ; structure match + assignment
    [else 'no-match]))

(m "foo")         ; 'got-foo
(m 42)            ; 'got-number
(m (list 1 2 3))  ; '(2)
(m (thing 25 52)) ; 77
(m "bar")         ; 'no-match
(struct thing (x y))

(define (m in)
  (match in
    ["foo" 'got-foo]          ; literal match
    [(? number?) 'got-number] ; predicate match
    [(list a b c) (list b)]   ; list match + assignment
    [(thing i j) (+ i j)]     ; structure match + assignment
    [else 'no-match]))

(m "foo")         ; 'got-foo
(m 42)            ; 'got-number
(m (list 1 2 3))  ; '(2)
(m (thing 25 52)) ; 77
(m "bar")         ; 'no-match

In this example, we see a few of the options for left-hand patterns. + Many more can be found in the docs for match. A literal value like "foo" is matched exactly. A predicate like number? can be matched by wrapping it in the ? operator. When list is used as a match pattern, it not only matches the input but also assigns the pieces to new identifiers—in this case, extracting the middle element b into a new list. Similarly, any structure type can be used as a pattern, and its values matched by position.

basic/colorer.rkt

#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else ···]))
#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else ···]))

basic/colorer.rkt

#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else
     (match-define
       (srcloc-token
        (token-struct type val _ _ _ _ _)
        (srcloc _ _ _ posn span)) srcloc-tok)
        ···]))
#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else
     (match-define
       (srcloc-token
        (token-struct type val _ _ _ _ _)
        (srcloc _ _ _ posn span)) srcloc-tok)
        ···]))

basic/colorer.rkt

#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else
     (match-define
       (srcloc-token
        (token-struct type val _ _ _ _ _)
        (srcloc _ _ _ posn span)) srcloc-tok)
     (define start posn)
     (define end (+ start span))
     (match-define (list cat paren)
       (match type
         ['STRING '(string #f)]
         ['REM '(comment #f)]
         ['ERROR '(error #f)]
         [else (match val
                 [(? number?) '(constant #f)]
                 [(? symbol?) '(symbol #f)]
                 ["(" '(parenthesis |(|)]
                 [")" '(parenthesis |)|)]
                 [else '(no-color #f)])]))
     (values val cat paren start end)]))
#lang br
(require "lexer.rkt" brag/support)
(provide basic-colorer)

(define (basic-colorer port)
  (define (handle-lexer-error excn)
    (define excn-srclocs (exn:fail:read-srclocs excn))
    (srcloc-token (token 'ERROR) (car excn-srclocs)))
  (define srcloc-tok
    (with-handlers ([exn:fail:read? handle-lexer-error])
      (basic-lexer port)))
  (match srcloc-tok
    [(? eof-object?) (values srcloc-tok 'eof #f #f #f)]
    [else
     (match-define
       (srcloc-token
        (token-struct type val _ _ _ _ _)
        (srcloc _ _ _ posn span)) srcloc-tok)
     (define start posn)
     (define end (+ start span))
     (match-define (list cat paren)
       (match type
         ['STRING '(string #f)]
         ['REM '(comment #f)]
         ['ERROR '(error #f)]
         [else (match val
                 [(? number?) '(constant #f)]
                 [(? symbol?) '(symbol #f)]
                 ["(" '(parenthesis |(|)]
                 [")" '(parenthesis |)|)]
                 [else '(no-color #f)])]))
     (values val cat paren start end)]))

Within the match-define, we use two instances of match. The first will match against the type. A token of type 'STRING gets colored as a 'string, a 'REM token gets colored as a 'comment, and an 'ERROR token gets colored as an 'error. (In all cases, there is no parenthesis shape.) + Astute observers may have noticed that we gave our 'ERROR tokens a type, but not a value, which defaults to #f. DrRacket primarily relies on the start and end positions for coloring. So as long as those are correct, the colorer will work, despite the missing token value.

If we don’t get a match for type, in its else branch we have a nested match on val. Tokens that are number? are each colored as a 'constant; those that are symbol? are each colored as a 'symbol. We use literal "(" and ")" matches to catch any parentheses (this time our coloring annotation has both a 'parenthesis category and the correct parenthesis shape). + Because parentheses already serve as list delimiters in Racket, a parenthesis used as a literal symbol has to be escaped with vertical bars: thus '|)| rather than '). Finally, anything else gets a 'no-color annotation.

For instance, right now a line number is delivered inside a token of type 'INTEGER. So it can’t be differentiated from an ordinary number that’s also delivered inside an 'INTEGER token. It’s true that these can be differentiated by the parser—a line number is in a specific position in the grammar—but we can’t use a parser here. + Why not? Because we can’t parse something unless it can first be lexed and tokenized. As we noticed at the beginning, the code is often in a non-lexable state during editing.

Beautiful Racket / tutorials

Into the rapids: more basic

Designing the colorer

Updating the main module

The colorer module

Playing with matches

Making coloring annotations

Testing the colorer

Thought experiment

Beau­tiful Racket / tuto­rials

Into the rapids: more basic

Designing the colorer

Updating the main module

The colorer module

Playing with matches

Making coloring anno­ta­tions

Testing the colorer

Thought exper­i­ment

Beautiful Racket / tutorials

Making coloring annotations

Thought experiment