Senior SDET · Learning

Mutation Testing

Code coverage tells you which lines your tests execute. Mutation testing tells you whether those tests would catch a real bug. It's the sharpest tool in the senior SDET's quality assessment kit — and almost nobody in NZ uses it.

Senior SDET ISTQB CTAL-TAE ~20 min read + exercise

1 The Hook — Why This Matters

A Wellington fintech has 92% line coverage on their payment gateway module. The CTO shows this metric in board reports as evidence of quality. A senior SDET joins the team and runs Stryker.js over the codebase. The mutation score is 41%.

Nearly 60% of mutants survive. That means 60% of the time, if the code is intentionally broken in a small way — a boundary shifted, a rate zeroed out, an operator flipped — the tests still pass. The test suite ran the code. It just never checked whether the code was doing anything correct.

The coverage number was measuring "lines that ran." The mutation score was measuring "lines where bugs would be caught." The board had been given the wrong metric for two years. The day the SDET presented the mutation report, the team rewrote their test suite. It took six weeks. They found three pre-existing bugs in the tax calculation logic during the process.

2 The Rule — The One-Sentence Version

Coverage measures execution. Mutation score measures detection. A test suite that runs every line but asserts on none of them will score 100% coverage and 0% mutation score. Only the mutation score tells you whether your tests work.

3 The Analogy — Think Of It Like...

Analogy

Smoke alarm versus fire drill.

Code coverage is a smoke alarm that beeps every time you cook — it confirms the alarm is mechanically functional. Mutation testing is a fire drill where you actually set a small controlled fire and verify the alarm both beeps AND that people evacuate the building correctly. Most NZ teams only do the smoke alarm check. They know the tests run. They don't know whether anyone would actually respond to a real fire.

4 Watch Me Do It — Stryker.js on a NZ Tax Utility

Here's a PAYE calculation function that hits every NZ income tax bracket. It has 100% line coverage from a single test. Stryker will tell a different story.

The function under test

// src/nzTax.js
function calculatePAYE(annualIncome) {
  if (annualIncome <= 14000) return annualIncome * 0.105;
  if (annualIncome <= 48000) return 1470 + (annualIncome - 14000) * 0.175;
  if (annualIncome <= 70000) return 7420 + (annualIncome - 48000) * 0.30;
  if (annualIncome <= 180000) return 14020 + (annualIncome - 70000) * 0.33;
  return 50370 + (annualIncome - 180000) * 0.39;
}

The weak test (100% coverage, low mutation score)

test('calculates PAYE', () => {
  expect(calculatePAYE(50000)).toBeCloseTo(9020, 0);
});

This test runs every branch. But Stryker generates mutants that survive it:

  • Boundary shift: change <= to < on the $48,000 bracket. Your test at $50,000 doesn't test the boundary, so the mutant survives.
  • Constant replacement: change 0.30 to 0.0. Your test is in the next bracket, so the broken rate is never exercised.
  • Operator swap: change + to - in the accumulator. Again, not tested in that bracket.

Stryker config

{
  "mutate": ["src/**/*.js", "!src/**/*.test.js"],
  "testRunner": "jest",
  "reporters": ["html", "clear-text"],
  "thresholds": { "high": 80, "low": 60, "break": 50 }
}

The stronger test suite (kills the surviving mutants)

// Test every bracket boundary
const cases = [
  [0,       0],
  [14000,   1470],
  [14001,   1470.175],
  [48000,   8420],
  [48001,   8420.30],
  [70000,   15020],
  [70001,   15020.33],
  [180000,  50370],
  [180001,  50370.39]
];

test.each(cases)('PAYE for $%i is $%f', (income, expected) => {
  expect(calculatePAYE(income)).toBeCloseTo(expected, 1);
});
Pro tip: Stryker's HTML report shows each surviving mutant with the exact code change and which tests ran against it. Always read the report before drawing conclusions — some survivors are equivalent mutants (the mutation produces functionally identical code). Mark those as ignored, not fixed.

5 When to Use It / When NOT to Use It

✅ Run mutation testing when...

  • Critical business logic: tax, financial calculations, access control
  • Before a major release of a payment or compliance module
  • Coverage is high but confidence is low
  • Periodic quality audits (monthly or quarterly)

❌ Skip it when...

  • UI rendering code (low ROI — visual output is hard to assert precisely)
  • Generated or scaffolded code you didn't write
  • Every CI push — it's CPU-intensive and will make your pipeline unusable
  • You haven't established a baseline test suite yet

6 Common Mistakes — Don't Do This

🚫 Treating coverage as a quality signal

I used to think: 80% coverage is enough evidence of quality.
Actually: Coverage percentage alone tells you nothing about whether bugs would be caught. A mutation score below 60% on critical code is a serious quality gap regardless of line coverage. Report both metrics, or the board is flying blind.

🚫 Running mutation tests on every CI push

I used to think: More frequent mutation testing means faster feedback.
Actually: Mutation testing is CPU-intensive. Running it on every PR would make CI unusable in minutes. Run it on critical modules only, on a schedule — nightly or weekly — not on every commit. Use conventional coverage for fast PR feedback.

🚫 Treating every surviving mutant as a test failure

I used to think: Surviving mutants always mean the tests are bad.
Actually: Some surviving mutants are equivalent mutants — the mutation produces functionally identical code and no test could ever kill it. Stryker's HTML report lets you mark these as ignored with a reason. Always review before acting on the score, otherwise you'll waste time writing tests that add zero value.

7 Now You Try — Prompt Lab

🧪 AI Exercise

A Node.js function validates NZ IRD numbers (8 or 9 digits, valid checksum). There is one test: it passes with a valid IRD number and returns false for a string. Mutation testing shows 6 surviving mutants.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. Three for three means you're ready to practice.

Q1. What is the difference between a killed mutant and a surviving mutant?

A killed mutant is a code change that caused at least one test to fail — the test suite detected the artificial bug. A surviving mutant is a code change that left all tests passing, meaning the test suite could not detect that specific type of fault. Surviving mutants reveal gaps in test assertions.

Q2. Why would you NOT run mutation testing on every CI push?

Mutation testing generates hundreds or thousands of modified versions of the code and runs the entire test suite against each one. For a medium-sized codebase this can take hours. Running it on every commit would make CI pipelines unusable. Run it on a schedule (nightly or weekly) against critical modules only.

Q3. A module has 95% line coverage but a 35% mutation score. What does this tell you?

The tests execute most of the code but assert almost nothing meaningful. The test suite would miss roughly 65% of deliberate small bugs injected into the logic. This is a high-risk state for a critical module — the coverage metric has been giving false confidence. The team needs to add boundary and assertion-rich tests, not more coverage.

9 ISTQB Mapping

ISTQB CT-TAE (Certified Tester — Test Automation Engineer)

Section 3.4 — Coverage criteria for structural testing. Mutation testing is classified as a white-box technique that extends structural coverage to fault detection. CTAL-TAE v2.0 explicitly recognises mutation score as a supplementary quality metric alongside statement and branch coverage.

ISTQB CTFL v4.0

Section 4.2.5 — Statement, branch, and path coverage. Mutation testing is the technique that validates whether a test suite is capable of detecting the faults that coverage criteria alone cannot surface. In exam questions, expect to distinguish between coverage (execution) and mutation score (detection efficacy).