Specialised · Senior & SDET

Mutation Testing

Your test suite has 95% line coverage. Your mutation score is 38%. Those two facts are not contradictory — they are measuring completely different things. This page explains why coverage is the wrong quality metric, what mutation testing actually measures, and what to do about it.

Senior Senior SDET ISTQB CTAL-TAE · CTFL v4.0 ~15 min read · ~40 min with exercise

1 The Hook

A Wellington bank’s QA team presents their quarterly quality report to the CTO. 94% code coverage across all services. The CTO is satisfied.

That same quarter, a boundary condition in the interest calculation is wrong. The <= should be < for daily interest accrual. The bug is in a function with 100% statement coverage. Every line of that function executes in the tests. But the test only asserts calculateInterest(10000) > 0 — it never checks the actual value.

A mutation test would have replaced <= with < and observed the tests still pass. The mutation would have survived. 100% coverage, 0% detection of this specific bug.

The quarterly report was not lying. It was measuring the wrong thing.

2 The Rule

Code coverage measures which lines your tests execute. Mutation score measures whether your tests would detect a bug on those lines. A test that executes code without asserting on its output has 100% coverage and 0% mutation value.

3 The Analogy

Analogy

Switches and bulbs.

Code coverage is checking that every switch in a building has been flipped. Mutation testing is checking that flipping each switch actually turns the right light on.

You can flip every switch — 100% coverage — while every bulb is burned out — 0% detection. The coverage report tells you the switches were flipped. It tells you nothing about whether the lights came on.

4 Watch Me Do It — Stryker on a NZ Interest Calculation

Here is the function under test: a simple daily interest calculator for a NZ lending product.

// src/interest.js
function calculateDailyInterest(principal, annualRate, days) {
  if (days <= 0) return 0;
  return principal * (annualRate / 365) * days;
}

A weak test that gives 100% coverage:

test('calculates interest', () => {
  const result = calculateDailyInterest(10000, 0.05, 30);
  expect(result).toBeGreaterThan(0); // ← asserts nothing specific
});

Every line executes. Coverage: 100%. But look at what Stryker creates when it mutates the function:

  • Change <= to < in the boundary check → test still passes (positive days still work, result is still > 0)
  • Change annualRate / 365 to annualRate / 366 → test still passes (result is still > 0)
  • Change * to / in the multiplication → test still passes (tiny fraction, but still > 0)

All three mutants survive. Your mutation score for this function: very low.

Fix it with a test that actually asserts on the value:

test('calculates 30-day interest on $10,000 at 5% p.a.', () => {
  // 10000 * 0.05 / 365 * 30 = 41.096...
  expect(calculateDailyInterest(10000, 0.05, 30)).toBeCloseTo(41.10, 1);
  expect(calculateDailyInterest(10000, 0.05, 0)).toBe(0);   // boundary
  expect(calculateDailyInterest(10000, 0.05, -1)).toBe(0);  // negative days
});

Now Stryker kills almost all mutants. The division-by-366 mutant produces 40.98 instead of 41.10 — the toBeCloseTo(41.10, 1) assertion catches it. The boundary mutant is caught by the explicit zero-days assertion.

Running Stryker in a JavaScript project:

npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner
npx stryker run

Minimal config (stryker.config.json):

{
  "mutate": ["src/**/*.js"],
  "testRunner": "jest",
  "reporters": ["html", "clear-text"],
  "thresholds": { "high": 80, "low": 60, "break": 50 }
}

Stryker generates an HTML report showing each mutant, whether it was killed or survived, and the exact line. Start with the survived mutants — those are your assertion gaps.

Pro tip: The Stryker HTML report groups survived mutants by file. Sort by survival rate to find the functions with the weakest assertions first. Don’t try to kill every mutant at once — equivalent mutants (where the changed code produces identical observable behaviour) can be marked “ignored” with a comment in the config.

5 When to Use It

Use mutation testing on:

  • Financial calculations — interest, GST, tax, exchange rates, loan repayments
  • Access control and authorisation logic
  • Validation rules — IRD number format, NZBN check digits, date range logic
  • Before a major release of a payment or compliance module
  • As a periodic audit of a critical library (monthly or per release cycle)

Do not use it as a daily CI gate. Mutation testing is CPU-intensive — it re-runs your entire test suite for every mutant. A suite with 500 tests might generate 2,000 mutants. That is 2,000 full test runs. Run it on a schedule or targeted at high-risk modules. It is a quality audit tool, not a commit gate.

Not suitable for: UI rendering code, generated code, or as a replacement for exploratory and integration testing. Mutation testing only tells you about assertion quality in unit tests. It does not find integration defects or usability issues.

6 Common Mistakes

🚫 High coverage means high quality

I used to think: if the coverage dashboard is green, the tests are doing their job.
Actually: coverage is necessary but not sufficient. A test suite can execute every line while asserting nothing meaningful. Coverage tells you what ran. Mutation score tells you whether what ran would catch a bug. You need both numbers — not just one.

🚫 Run mutation testing on every push

I used to think: if it’s useful, add it to CI on every commit like any other check.
Actually: mutation testing can turn a 2-minute test suite into a 2-hour run. Run it on a schedule — weekly, or before major releases — or scope it to critical modules only. Most teams carve out a separate pipeline stage for it rather than blocking pull requests.

🚫 Every surviving mutant means a bad test

I used to think: if a mutant survives, I must write a test to kill it.
Actually: some surviving mutants are equivalent — the changed code produces identical observable behaviour. Changing i++ to i += 1 in an isolated context is equivalent. Mark these as “ignored” in Stryker’s config. Review the survived mutant list before acting — not every survival is a problem.

7 Now You Try

🧪 Prompt Lab — Kill the Surviving Mutants

A NZ GST calculator function takes a pre-tax amount and returns it plus 15%. The current test checks that calculateGST(100) returns a number. Stryker shows 4 surviving mutants: one changes 0.15 to 0.16, one changes + to -, one removes the return statement entirely, and one changes the function to always return 0.

Write the additional tests that would kill all 4 mutants. Be specific about what each assertion checks and why it kills the corresponding mutant.

8 Self-Check

Click each question to reveal the answer.

Q1: A function has 100% branch coverage. Why might it still have a low mutation score?

Branch coverage tells you that all branches executed. It says nothing about whether the test asserted anything useful about the output of those branches. A test that takes every path but only checks result !== null will have full branch coverage and a low mutation score, because operators, constants, and logic changes within those branches go undetected.

Q2: What does a surviving mutant tell you about your test suite?

A surviving mutant tells you that a specific code change — operator replacement, constant change, condition inversion — was not detected by your assertions. The tests ran, they passed, and they should have failed. It is evidence of an assertion gap: the code in that area could be wrong and your tests would not catch it.

Q3: What mutation score threshold would you set for a payments calculation module?

Typically 80% or higher for financial calculation code — the same module where a bug affects real money. The Stryker config supports a break threshold (fail the build) and a low threshold (warn). For payments: break at 70%, warn at 80%. Generic application code is often set at 60/80. The right number depends on the risk of the module — higher stakes, higher threshold.

9 ISTQB Mapping

CTAL-TAE Section 3.4 — Structural test techniques for automation. Mutation testing is treated as an advanced coverage criterion that goes beyond statement, branch, and condition coverage to measure fault-detection effectiveness.

CTFL v4.0 Section 4.2 — Coverage-based testing. The syllabus distinguishes execution coverage (was the code run?) from fault-detection coverage (would the tests catch a change?). Mutation testing operationalises the latter.