3-Step Guide For Building Plots Faster With Test Driven Development.

5 min read

Unit-testing plots is challenging, as images are difficult to describe in code.

We need to rely on approval testing, which is usually utilised to validate that an output remains the same when code changes. But we don’t have a plot yet that we can assert on, how can we use Test Driven Development (TDD) and start from writing tests? Where to even start?

Let’s see how we can get the most benefits of TDD and approval testing to rapidly develop a plot.

1. Start with a test

We start in (almost) the same way as we would do with any other piece of software using TDD:

Make sure to do those steps in order, as only as you unlock the full potential of this approach!

Things to focus on when writing tests

Describe what the plot should look like — make different outcomes separate test cases. Be verbose, use as many words as you need to describe how the plot should behave.

Create a minimal setup to satisfy the requirements — remember about chicken counting: zero, one, many. If you want to test rendering of multiple series, two will probably suffice. Add a separate case to test limits of how much data can be plotted.

Let’s create a simple scatter plot with a regression line.

library(testthat)

describe("scatterplot", {
  it("should create a scatterplot with selected variables", {
    # Arrange
    data <- data.frame(
      X = c(1, 2, 3),
      Y = c(1, 2, 3)
    )

    # Act
    result <- scatterplot(data, x = X, y = Y)

    # Assert
    fail()
  })

  it("should add a regression line if the number of observations is larger than a given threshold", {
    # Arrange
    data <- data.frame(
      X = 1:4,
      Y = c(1, 3, 2, 4)
    )
    threshold <- 3

    # Act
    result <- scatterplot(data, x = X, y = Y, threshold = threshold)

    # Assert
    fail()
  })
})

Notice we use fail() in each test.

We don’t have an automated way of checking results yet, so we should expect tests to always fail 🔴 — our code is not in a releasable state yet! When we have tests in place, we can start adding production code.

Don’t write more code than it’s enough to satisfy tests — YAGNI!

library(ggplot2)

scatterplot <- function(data, x, y, threshold = 10) {
  data |>
    ggplot(aes(x = {{ x }}, y = {{ y }})) +
    geom_point() + {
      if (nrow(data) > threshold) {
        geom_smooth(method = "lm", se = FALSE)
      }
    }
}

2. Preview results for iterating quickly

We could add snapshots assertions at this stage, but since we’re still developing the plot:

It would add an overhead of reviewing and accepting new snapshots.

To make development more rapid, we can just add print calls before each fail() to manually inspect plots with each change, it will speed up each iteration, but:

So don’t linger in this stage for too long!

3. Add assertions on plot snapshots

Once we get to a stage when most features are stable, and we can finally add actual snapshot tests.

We can use vdiffr package and its expect_doppelganger method:

# Assert
vdiffr::expect_doppelganger(result, "plot_1")

Or we could use testthat::expect_snapshot_file, but then we’d need to handle saving the plot to a file on our own.

Notice that we don’t assert on the plot object, its representation in the R process, but we’re asserting on the object printed to the screen.

Assertion set up in such a way will do the following:

Make sure to encode your system information into the snapshot (e.g., with shinytest2::platform_variant()) — if you use a different system in CI, snapshots generated locally can be different from ones in the CI and the pipeline will fail.

If we follow this approach, we obtain:

Start using Test Driven Development today and enjoy instant feedback on what your code does and improved safety that your code keeps working as expected!