Experimenting with testing syntax and composition

2018-12-19 :: racket, testing

Rackunit is Racket’s unit testing library, and it has a pretty cool set of features. It aims to support testing at every stage of development, from simple inline checks to large, programmatically manipulated test suites. Unfortunately, despite the many features rackunit provides to support this goal, I have many-a-time found myself struggling against rackunit’s model for defining new checks. Last weekend, I finally got around to writing a testing library that provides a simpler interface to write flexible, composable test predicates and macros.

I always run into two problems when using rackunit.

Writing macros to provide nice test syntax kills source location reporting.
Checks can’t really be re-used, combined into bigger tests.

To make this concrete, here’s a progression of only-slightly-contrived checks that I might write in the process of testing some code. The progression is annotated with comments of reasoning and outcomes (I also added line numbers for the failure messages).

 1 #lang racket
 2 (require rackunit
 3          (for-syntax syntax/parse))
 4 
 5 ;; Want some test syntax, let's try wrapping a check in a macro...
 6 (define-syntax-rule (check-ands ands ...)
 7   (check-true (and ands ...)))
 8 (define-syntax-rule (check-ors ors ...)
 9   (check-true (or ors ...)))
10 
11 (check-ands #t #t) ;; OK, no message
12 (check-ands #t #t #f) ;; Fails, with pointer to this line
13 (check-ors #t #t #f)
14 (check-ors #f) ;; Fails
15 
16 
17 ;; So far so good.
18 
19 ;; Now let's write some more syntax that combines these tests into a
20 ;; more complex test.
21 
22 ;; e.g. (check-ands:ors #t #t : #f #t #f #t)
23 ;;      (check-ands:ors : #f)
24 (define-syntax (check-ands-and-ors stx)
25   (syntax-parse stx
26     [(_ ands:expr ... (~datum :) ors:expr ...)
27      (syntax/loc stx
28        (begin
29          ;; Can't combine the results of each check explicitly..
30          ;; Luckily, we have an implicit "and" because both checks happen
31          ;; by sequencing them
32          (check-ands ands ...)
33          (check-ors ors ...)))]))
34 
35 
36 (check-ands-and-ors : #t)
37 (check-ands-and-ors #t : #t)
38 (check-ands-and-ors #t : #t #f)
39 (check-ands-and-ors : #f) ;; Fails, but with the wrong location,
40 ;; and name "check-true
41 
42 
43 ;; And what if I want to write this?
44 ;; I can't.
45 #;(define-syntax (check-ands-or-ors stx)
46   (syntax-parse stx
47     [(_ ands:expr ... (~datum :) ors:expr ...)
48      (syntax/loc stx
49        ;; No simple implicit "or" to be had here...
50        #|???|#)]))

And here are the failure messages. Notice how, as noted in the comments, the final failure identifies the wrong line. Furthermore, the check names are woefully unhelpful. Every failure of that test will point to a line within the macro body instead of the top-level test invocation. Note also the last test that I want to be able to write but would need to do some contortions to actually do (e.g. catching failure exceptions).

--------------------
; FAILURE
; .../temp.rkt:12:0
name:       check-true
location:   temp.rkt:12:0
params:     '(#f)
--------------------
--------------------
; FAILURE
; .../temp.rkt:14:0
name:       check-true
location:   temp.rkt:14:0
params:     '(#f)
--------------------
--------------------
; FAILURE
; .../temp.rkt:33:9
name:       check-true
location:   temp.rkt:33:9
params:     '(#f)
--------------------

It turns out that with-check-info and define-check can be used to resolve the source location and check name problems, but getting it to work automatically isn’t pretty. (At least the solution I came up with involves generating macros and custom checks together, and is pretty brittle.) Additionally, this doesn’t solve the problem of composability.

After all this time fiddling with rackunit and trying to understand why I can’t write the tests I want to write, I realized that the primary issue is that rackunit’s model of identifying and recording test failure is internal to every test. Tests themselves determine their source location and register failures. This means that when tests get put in places that they don’t expect – that don’t fit with the basic usage as an assertion within a set of tests – they don’t work. The two pertinent examples of such places are inside a macro or inside another test (as demonstrated above).

The reason that tests managing failure recording/reporting is problematic is that in doing so they make assumptions about the context of their usage. Namely, they assume that they are the test, and thus remove the burden of recording failures from the context. This is usually pretty convenient. But to be able to glue tests together and make new tests, the tests must leave it up to the context to determine what their outcome means. If failure reporting is left up to the context, then failure reports can also always identify the actual source location of the top-level test that fails.

The basic idea of the wrapper is just this inversion of responsibility for interpreting failures. It allows tests to be considered just regular functions, because they don’t have any extra stuff tied to them to manage recording outcomes. This makes tests composable. It also allows consistent identification of the source location of the actual top-level check in failure messages.

Here’s what the equivalent sequence looks like with my interface, which I called ruinit. Many elements are inspired by things that rackunit does, like the fail function available in custom tests.

 1 #lang racket
 2 (require ruinit
 3          (for-syntax syntax/parse))
 4 
 5 ;; For test syntax, can use `define-test-syntax`, which provides
 6 ;; a binding for `fail` to indicate a test failure with a message
 7 (define-test-syntax (check-ands ands ...)
 8   #'(unless (and ands ...)
 9       (fail "Not every value given (~v) was true!" (list ands ...))))
10 
11 ;; Alternatively, just write a regular macro or function that returns
12 ;; #f in the case of failure (`define-test-syntax` could be used here
13 ;; just as well)
14 (define-syntax-rule (check-ors ors ...)
15   (or ors ...))
16 
17 (test-begin
18   (check-ands #t #t) ;; OK, no message
19   (check-ands #t #t #f) ;; Fails, with pointer to this line
20   (check-ors #t #t #f)
21   (check-ors #f)) ;; Fails
22 
23 
24 ;; So far so good.
25 
26 ;; Now let's write some more syntax that combines these tests into a
27 ;; more complex test.
28 
29 ;; e.g. (check-ands:ors #t #t : #f #t #f #t)
30 ;;      (check-ands:ors : #f)
31 (define-test-syntax (check-ands-and-ors ands:expr ... (~datum :) ors:expr ...)
32   #'(and/test (check-ands ands ...)
33               (check-ors ors ...)))
34 
35 
36 (test-begin
37   (check-ands-and-ors #t #f #t : #t) ;; Fails
38   (check-ands-and-ors #t : #t)
39   (check-ands-and-ors #t : #t #f)
40   (check-ands-and-ors #f : #f)) ;; Fails
41 ;; Now the locations point exactly to the failing test, and it has
42 ;; the actual top-level syntax of the test.
43 
44 
45 ;; And writing the inverse is just as easy
46 (define-test-syntax (check-ands-or-ors ands:expr ... (~datum :) ors:expr ...)
47   #'(or/test (check-ands ands ...)
48              (check-ors ors ...)))
49 (test-begin
50   (check-ands-or-ors #t #f #t : #t)
51   (check-ands-or-ors #t : #t)
52   (check-ands-or-ors #t : #t #f)
53   (check-ands-or-ors #f : #f)) ;; Fails

And here are the failure reports. Every failure identifies the real top-level invocation of the test and shows the actual syntax of the failing test.

--------------- FAILURE ---------------
location: temp.rkt:19:2
test:     (check-ands #t #t #f)
message:  Not every value given ('(#t #t #f)) was true!
---------------------------------------
--------------- FAILURE ---------------
location: temp.rkt:21:2
test:     (check-ors #f)
---------------------------------------
--------------- FAILURE ---------------
location: temp.rkt:37:2
test:     (check-ands-and-ors #t #f #t : #t)
message:  Not every value given ('(#t #f #t)) was true!
---------------------------------------
--------------- FAILURE ---------------
location: temp.rkt:40:2
test:     (check-ands-and-ors #f : #f)
message:  Not every value given ('(#f)) was true!
---------------------------------------
--------------- FAILURE ---------------
location: temp.rkt:53:2
test:     (check-ands-or-ors #f : #f)
message:  or/test: all tests failed
---------------------------------------

Update: 2019–07–28

After using ruinit for a couple of months pretty much constantly, I’m pretty happy with it. It hasn’t revolutionized my tests or anything like that, but it has pretty much eliminated the stumbling blocks that used to constantly frustrate me. It has turned out to be a fun and worthwhile experiment.