Skip to content

Testing scientific software

This article is a plea for scientists who write code to test their software.

Let’s get this out of the way – you must write tests.

Tests are just as required in scientific software as in all other software. If you’re already convinced, skip this section.

Quote

Researchers are spending more and more time writing computer software to model biological structures, simulate the early evolution of the Universe and analyse past climate data, among other topics. But programming experts have little faith that most scientists are up to the task.” ― Merali 2010 (doi:10.1038/467775a) (bolding changed)

Here’s what happens: 5 retractions in one lab due to a bug. Yet because code is often not released and few others would bother to try to reproduce published results, probably 90%+ of scientific software bugs remain uncaught. A lack of deceptive intent doesn’t lessen the damage, and not testing is reckless. Not testing your software is as egregious a fault in scientific integrity as p‐hacking.

On a software development team, you can’t push new code to a main branch without accompanying tests – which must pass. In fact, many good coders write tests first, before any other code.

Professional software developers are the world’s best coders, so the code they write presumably has the lowest per‐line error rate. Yet no decent programmer would trust themselves to write code that works without tests. So why do scientists routinely write, use, and even publish on the basis of code without tests, code that is almost certainly faulty? Let’s not do that. You must write tests.

If you’re still not convinced, let’s try a thought experiment. Count the number of times you’ve done this:

  1. You run your code. It raises an error or gives an unreasonable result.
  2. You trace through to find the error and fix the bug.
  3. Now your code doesn’t error and gives a reasonable result.

Each bug could just as easily have not caused an error. This process does not make your code correct; it just selects for results that look reasonable.