Mutation Testing
Your test suite has 95% line coverage. Your mutation score is 38%. Those two facts are not contradictory — they are measuring completely different things. This page explains why coverage is the wrong quality metric, what mutation testing actually measures, and what to do about it.
1 The Hook
A Wellington bank’s QA team presents their quarterly quality report to the CTO. 94% code coverage across all services. The CTO is satisfied.
That same quarter, a boundary condition in the interest calculation is wrong. The <= should be < for daily interest accrual. The bug is in a function with 100% statement coverage. Every line of that function executes in the tests. But the test only asserts calculateInterest(10000) > 0 — it never checks the actual value.
A mutation test would have replaced <= with < and observed the tests still pass. The mutation would have survived. 100% coverage, 0% detection of this specific bug.
The quarterly report was not lying. It was measuring the wrong thing.
2 The Rule
Code coverage measures which lines your tests execute. Mutation score measures whether your tests would detect a bug on those lines. A test that executes code without asserting on its output has 100% coverage and 0% mutation value.
3 The Analogy
Switches and bulbs.
Code coverage is checking that every switch in a building has been flipped. Mutation testing is checking that flipping each switch actually turns the right light on.
You can flip every switch — 100% coverage — while every bulb is burned out — 0% detection. The coverage report tells you the switches were flipped. It tells you nothing about whether the lights came on.
4 Watch Me Do It — Stryker on a NZ Interest Calculation
Here is the function under test: a simple daily interest calculator for a NZ lending product.
// src/interest.js
function calculateDailyInterest(principal, annualRate, days) {
if (days <= 0) return 0;
return principal * (annualRate / 365) * days;
}
A weak test that gives 100% coverage:
test('calculates interest', () => {
const result = calculateDailyInterest(10000, 0.05, 30);
expect(result).toBeGreaterThan(0); // ← asserts nothing specific
});
Every line executes. Coverage: 100%. But look at what Stryker creates when it mutates the function:
- Change
<=to<in the boundary check → test still passes (positive days still work, result is still > 0) - Change
annualRate / 365toannualRate / 366→ test still passes (result is still > 0) - Change
*to/in the multiplication → test still passes (tiny fraction, but still > 0)
All three mutants survive. Your mutation score for this function: very low.
Fix it with a test that actually asserts on the value:
test('calculates 30-day interest on $10,000 at 5% p.a.', () => {
// 10000 * 0.05 / 365 * 30 = 41.096...
expect(calculateDailyInterest(10000, 0.05, 30)).toBeCloseTo(41.10, 1);
expect(calculateDailyInterest(10000, 0.05, 0)).toBe(0); // boundary
expect(calculateDailyInterest(10000, 0.05, -1)).toBe(0); // negative days
});
Now Stryker kills almost all mutants. The division-by-366 mutant produces 40.98 instead of 41.10 — the toBeCloseTo(41.10, 1) assertion catches it. The boundary mutant is caught by the explicit zero-days assertion.
Running Stryker in a JavaScript project:
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner
npx stryker run
Minimal config (stryker.config.json):
{
"mutate": ["src/**/*.js"],
"testRunner": "jest",
"reporters": ["html", "clear-text"],
"thresholds": { "high": 80, "low": 60, "break": 50 }
}
Stryker generates an HTML report showing each mutant, whether it was killed or survived, and the exact line. Start with the survived mutants — those are your assertion gaps.
5 When to Use It
Use mutation testing on:
- Financial calculations — interest, GST, tax, exchange rates, loan repayments
- Access control and authorisation logic
- Validation rules — IRD number format, NZBN check digits, date range logic
- Before a major release of a payment or compliance module
- As a periodic audit of a critical library (monthly or per release cycle)
Do not use it as a daily CI gate. Mutation testing is CPU-intensive — it re-runs your entire test suite for every mutant. A suite with 500 tests might generate 2,000 mutants. That is 2,000 full test runs. Run it on a schedule or targeted at high-risk modules. It is a quality audit tool, not a commit gate.
Not suitable for: UI rendering code, generated code, or as a replacement for exploratory and integration testing. Mutation testing only tells you about assertion quality in unit tests. It does not find integration defects or usability issues.
6 Common Mistakes
🚫 High coverage means high quality
I used to think: if the coverage dashboard is green, the tests are doing their job.
Actually: coverage is necessary but not sufficient. A test suite can execute every line while asserting nothing meaningful. Coverage tells you what ran. Mutation score tells you whether what ran would catch a bug. You need both numbers — not just one.
🚫 Run mutation testing on every push
I used to think: if it’s useful, add it to CI on every commit like any other check.
Actually: mutation testing can turn a 2-minute test suite into a 2-hour run. Run it on a schedule — weekly, or before major releases — or scope it to critical modules only. Most teams carve out a separate pipeline stage for it rather than blocking pull requests.
🚫 Every surviving mutant means a bad test
I used to think: if a mutant survives, I must write a test to kill it.
Actually: some surviving mutants are equivalent — the changed code produces identical observable behaviour. Changing i++ to i += 1 in an isolated context is equivalent. Mark these as “ignored” in Stryker’s config. Review the survived mutant list before acting — not every survival is a problem.
7 Now You Try
A NZ GST calculator function takes a pre-tax amount and returns it plus 15%. The current test checks that calculateGST(100) returns a number. Stryker shows 4 surviving mutants: one changes 0.15 to 0.16, one changes + to -, one removes the return statement entirely, and one changes the function to always return 0.
Write the additional tests that would kill all 4 mutants. Be specific about what each assertion checks and why it kills the corresponding mutant.
8 Self-Check
Click each question to reveal the answer.
Q1: A function has 100% branch coverage. Why might it still have a low mutation score?
Branch coverage tells you that all branches executed. It says nothing about whether the test asserted anything useful about the output of those branches. A test that takes every path but only checks result !== null will have full branch coverage and a low mutation score, because operators, constants, and logic changes within those branches go undetected.
Q2: What does a surviving mutant tell you about your test suite?
A surviving mutant tells you that a specific code change — operator replacement, constant change, condition inversion — was not detected by your assertions. The tests ran, they passed, and they should have failed. It is evidence of an assertion gap: the code in that area could be wrong and your tests would not catch it.
Q3: What mutation score threshold would you set for a payments calculation module?
Typically 80% or higher for financial calculation code — the same module where a bug affects real money. The Stryker config supports a break threshold (fail the build) and a low threshold (warn). For payments: break at 70%, warn at 80%. Generic application code is often set at 60/80. The right number depends on the risk of the module — higher stakes, higher threshold.
9 ISTQB Mapping
CTAL-TAE Section 3.4 — Structural test techniques for automation. Mutation testing is treated as an advanced coverage criterion that goes beyond statement, branch, and condition coverage to measure fault-detection effectiveness.
CTFL v4.0 Section 4.2 — Coverage-based testing. The syllabus distinguishes execution coverage (was the code run?) from fault-detection coverage (would the tests catch a change?). Mutation testing operationalises the latter.