Healthcare QA · Lesson 4

Regulated & Medical-Device Software Testing

Some health software is itself a regulated medical device. When that is true, the testing standard rises sharply — and the evidence you keep matters as much as the testing you do. This lesson teaches you that shift.

Healthcare QA Healthcare & Health Data — Lesson 4 of 5 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ medtech startup, Hauora Logic, built an app that helped clinicians calculate a medication dose from a patient’s weight and a few other inputs. The team treated it like any other app — they tested the calculations, fixed bugs, shipped updates quickly, and moved on. The software worked well in their own testing.

Then a hospital that wanted to adopt it asked a question the team had not prepared for: “This software calculates a dose, so it influences a clinical decision — can you show us your intended-use statement, your risk analysis, and the traceability from each requirement to the test that proves it?” Hauora Logic had none of that. They had working software and a pile of fixed bugs, but no documented evidence trail tying what the software was meant to do to proof that it did it safely.

The deeper issue was that the team had not recognised what they had built. Software that calculates a dose to inform treatment is not just an app — it has the character of a medical device, and software in that category is regulated. The bar is not “does it work in our hands” but “can you demonstrate, with documented evidence, that it does what it is intended to do and that its risks have been identified and controlled.”

Here is the lesson hidden in that story. The team tested the software and kept almost no evidence. In regulated medical software, untraceable testing barely counts — if you cannot show which requirement a test proves and that the test passed, an assessor treats the requirement as unverified. The testing and the documented evidence of testing are two parts of one obligation.

2 The Rule

When software is itself a medical device, working is not the standard — demonstrable is. You must be able to trace every requirement to the risk it addresses and the test that proves it, and keep the documented evidence to show it. A test with no recorded result and no link to a requirement does not exist as far as a regulator is concerned. In regulated software, the evidence trail is part of the product.

3 The Analogy

Analogy

A building consent for a house in NZ.

You cannot just build a house well and hope the council agrees. Every structural decision must trace to a plan, the plan must address known risks like earthquakes and weathertightness, and an inspector signs off each stage with a record. A beautifully built house with no consent and no inspection records cannot get a code compliance certificate — not because the work is necessarily bad, but because there is no evidence trail proving it meets the standard. The records are not bureaucracy; they are how safety is demonstrated to someone who was not there.

Regulated medical software is the same. Hauora Logic built a working app but kept no consent-equivalent trail — no intended use, no risk analysis, no traceability from requirement to test. So even though it worked in their hands, it could not be signed off, because nothing demonstrated it. A tester in this domain is the inspector and the record-keeper combined: proving each requirement is met, and leaving the documented evidence that proves it to anyone who asks later.

4 Software as a Medical Device

The key concept is Software as a Medical Device — often abbreviated SaMD. It describes software intended to be used for a medical purpose on its own, without being part of a hardware device. A dose calculator, an app that analyses an image to flag a possible condition, or a tool that triages symptoms into a clinical recommendation can all fall into this category, because the software itself performs a medical function.

What pushes software into this category is its intended use — what the maker says it is for. The same code that displays information is treated very differently from code that interprets that information into a clinical decision. A viewer that shows a result is lower-stakes; a calculator that recommends a dose, or a model that says “this looks like a fracture,” is doing clinical work and carries higher risk.

For a tester, two ideas follow from this:

  • The intended-use statement scopes everything. It defines what the software claims to do, for whom, and in what setting. Your tests must prove the software does exactly that — no more, no less — and your testing of boundaries is testing the edges of that claim.
  • Risk drives rigour. The higher the potential harm from a wrong output, the more thorough and more documented the testing must be. Software that informs a critical decision is held to a higher standard than software that displays a reference table.

Stay conceptual about exact classification rules — they vary and change. The durable testing insight is that intended use and risk together decide how hard you test and how much evidence you keep, and recognising when an “app” is actually a medical device is itself a tester’s contribution.

5 The Medsafe Context in NZ

In NZ, medical devices — including software that qualifies as one — sit within a regulatory framework administered by Medsafe, the medicines and medical devices safety authority. Conceptually, Medsafe is the body concerned with the safety of medicines and medical devices on the NZ market, and software acting as a medical device falls within that scope.

You do not need to be a regulatory specialist, and the specifics of the regime evolve, so stay conceptual. What matters for a tester is the posture this context creates:

  • Safety is demonstrated, not asserted. A maker is expected to be able to show that the device is safe and performs as intended — which means the evidence your testing produces is part of how that obligation is met.
  • The intended use and the claims must match reality. If the software is marketed as doing something, it must actually do that, and the testing evidence supports the claim. Testing against the claimed intended use is directly relevant.
  • Post-market matters too. Safety does not stop at release. Issues found in the field, and changes made afterward, are expected to be handled in a controlled way — which connects to regression testing and change control for regulated software.
Pro tip: You are not the regulatory expert, but you are often the first person to ask “is this software actually a medical device?” If a feature interprets data into a clinical recommendation, raise the question early. Discovering at adoption time, as Hauora Logic did, that you needed an evidence trail you never built is far more expensive than building it from the start.

6 Traceability

Traceability is the backbone of testing regulated software, and it is the thing Hauora Logic lacked. It means an unbroken chain of links: each requirement connects to the risk it relates to, to the test that verifies it, and to the recorded result of that test. Read in either direction, the chain answers two questions an assessor will ask.

Forward — “for this requirement, where is the test that proves it, and did it pass?” Backward — “for this test, which requirement does it verify and which risk does it control?” If any requirement has no linked test, it is unverified. If any test links to no requirement, it may be testing something no one asked for while a real requirement goes uncovered. The matrix that records these links is the requirements traceability matrix, and in regulated software it is not optional paperwork — it is the map of whether the product has actually been verified.

For a tester, traceability changes daily habits. Every test case references the requirement it covers. Every requirement, especially every safety-related one, has at least one test that proves it. A change to a requirement triggers a review of the linked tests. Coverage is measured against requirements and risks, not just lines of code. The discipline is not glamorous, but in this domain it is the difference between “we tested it” and “we can show exactly what we tested and prove it works.”

7 Documentation and Evidence

In regulated software, the documented evidence of testing is part of the deliverable, not an afterthought. The principle is that an independent assessor, who was never in the room, can read your records and be satisfied the software was verified against its requirements and its risks controlled. That sets a higher bar for what you record than most projects keep.

The kinds of evidence that matter:

  • The result, not just the assertion: the actual recorded outcome of each test — pass or fail, with the data and conditions — not a summary sentence saying it was fine.
  • Versioned and dated: which version of the software a test ran against, when, and by whom, so a result can be tied to a specific build.
  • Traceable: linked back to the requirement and risk it addresses, per the matrix above.
  • Reproducible: enough detail — steps, inputs, environment — that someone else could re-run the test and get the same result.
  • Defects and their resolution: a record of what was found, how it was assessed for risk, what was done, and the re-test that confirmed the fix.

This is the same audit-ready mindset that runs through the rest of healthcare and other regulated tracks: a test with no recorded, traceable, reproducible result is invisible to an assessor. “We tested the dose calculation” is not evidence; the recorded results of named test cases against a specific version, linked to the dose-calculation requirement and its risk, are.

8 What to Test in Regulated Software

Beyond the documentation discipline, the testing itself has emphases that matter more in this domain:

  • Correctness of the clinical function: the dose calculation, the image flag, the triage recommendation must be right across the full input range — not just typical values. A dose calculator that is correct for adults but wrong for very low weights is a patient-safety defect.
  • Boundary and out-of-range inputs: what does the software do with a weight of zero, a negative value, an impossibly high value, or a missing input? Safe behaviour at the edges — refuse and warn rather than produce a dangerous number — is critical.
  • Units and presentation: the Lesson 1 lesson returns — an ambiguous unit on a calculated dose is a serious defect, because the human reads the wrong quantity.
  • Risk-control verification: for each identified risk, the control meant to mitigate it actually works. If the risk is “clinician enters weight in pounds not kilograms,” test the control that catches it.
  • Change control and regression: a change anywhere is re-verified against the linked requirements, because in regulated software an untested change to a clinical function is unacceptable.
  • Use within intended use only: the software behaves safely — or refuses — outside its stated scope, so it is not silently relied on for something it was never validated for.

9 Common Mistakes

🚫 Not recognising that the software is a medical device

Why it happens: It looks like an ordinary app, so it gets tested like one.
The fix: If the software interprets data into a clinical decision — a dose, a flag, a recommendation — it has the character of a medical device and the bar rises. Raise the question early, as Hauora Logic failed to, before you have shipped without the evidence trail you needed.

🚫 Testing thoroughly but keeping no traceable, recorded evidence

Why it happens: The testing feels like the real work and documentation feels like overhead.
The fix: In regulated software a test with no recorded result and no link to a requirement does not count — the requirement is treated as unverified. Record each result against a version, linked to its requirement and risk. The evidence is part of the product.

🚫 Measuring coverage by code instead of by requirements and risks

Why it happens: Code-coverage numbers are easy to generate and look reassuring.
The fix: A high line-coverage number can still leave a safety requirement unverified. Measure coverage against the requirements and identified risks — every safety-related requirement needs a linked, passing test, and the traceability matrix is how you show it.

🚫 Shipping a change to a clinical function without re-verifying it

Why it happens: The change looks small and unrelated to the calculation.
The fix: In regulated software an untested change to a clinical function is unacceptable. A change triggers re-verification of the linked requirements and a regression pass, with the re-test recorded. Change control is part of keeping the device safe after release.

10 Now You Try

Three graded exercises across classification, traceability, and evidence. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Regulated-Software Gaps

Read the description of a fictional NZ clinical decision-support app below. Identify 3 gaps that a regulated-software assessor would raise, and say what each one means for the product’s ability to be signed off.

App: anticoagulant dosing assistant
The app recommends a starting anticoagulant dose from a patient’s weight, age, and kidney function. The team has thorough unit tests for the calculation, but no written intended-use statement and no risk analysis. There is no traceability between requirements and tests — tests are organised by code module. Test results are not recorded against a software version; the team re-runs the suite and trusts the latest green run. A recent change to the kidney-function input was shipped without a documented re-test of the dose calculation.

List 3 gaps and what each means for sign-off:

Show model answer
There are at least four real gaps here; any three well-explained earns full marks.

1. No intended-use statement — there is no defined claim of what the app is for, for whom, and in what setting. Why it blocks sign-off: without it there is nothing to test the software against and no boundary for its validated use; the whole verification has no scope.

2. No risk analysis — the clinical risks (wrong dose, wrong-unit weight, out-of-range kidney function) are not identified. Why it blocks sign-off: an assessor cannot see that risks were recognised and controlled, and the testing cannot be shown to address them.

3. No traceability — tests are organised by code module, not linked to requirements. Why it blocks sign-off: you cannot show that each requirement, especially each safety requirement, has a passing test; requirements are treated as unverified.

Bonus: results not recorded against a version, and a change shipped without a documented re-test. Why it blocks sign-off: a green run with no recorded, versioned result is not evidence, and an untested change to a clinical function breaks change control.

The trap: thorough unit tests feel like enough, but in regulated software untraceable, unrecorded testing barely counts.
🔧 Exercise 2 of 3 — Fix the Test Case

The test case below tests a clinical calculation but keeps no traceable, recorded evidence. Rewrite it for regulated software, with these fields: Test ID, Requirement ref, Risk ref, Intended-use scope, Pre-conditions, Action, Expected result, Boundary cases, Evidence recorded, Version under test. Use a fictional paediatric dose calculator as the context.

Original (too shallow):
“Enter a weight and check the dose is right. Pass if the number looks correct.”

Rewrite as a traceable, evidenced regulated-software test case:

Show model answer
Test ID: MD-DOSE-009

Requirement ref: REQ-014 (calculate paediatric dose from weight within stated range)

Risk ref: RISK-03 (incorrect dose from out-of-range or wrong-unit weight)

Intended-use scope: dose guidance for paediatric patients within the weight range stated in the intended-use statement; not for neonates or adults.

Pre-conditions: app version recorded; reference dose values agreed with a clinical source for the test weights.

Action: enter a set of in-range weights and verify the calculated dose against the agreed reference values, including the exact min and max of the stated range.

Expected result: each calculated dose matches the agreed reference value within the defined tolerance, with the correct unit shown explicitly.

Boundary cases: weight = 0, negative, just below min, just above max, and a value in pounds — each must be refused/warned, never producing a dose; an out-of-scope weight (e.g. adult) is refused per intended use.

Evidence recorded: the input weights, the calculated doses, the reference values, pass/fail per case, tester, date, and the build/version; linked to REQ-014 and RISK-03 in the traceability matrix.

Version under test: the specific build identifier the run executed against.

What makes it strong: it links to a requirement and a risk, scopes to the intended use, tests boundaries and out-of-scope inputs (not just a happy value), asserts units, and records versioned, traceable evidence — none of which the original had.
🏗️ Exercise 3 of 3 — Build a Traceability Plan

Design a traceability and evidence plan of 5 entries for a fictional NZ Software-as-a-Medical-Device symptom-triage tool. Each entry should link: a requirement, the risk it addresses, the test that verifies it, and the evidence recorded. Cover the core clinical function, an out-of-range input, a risk control, the intended-use boundary, and a change-control/regression case.

Show model answer
TRACE-01 | Requirement: triage produces the correct recommendation for a defined symptom set | Risk: under-triage of a serious condition causes delayed care | Verifying test: run the defined symptom sets and compare the recommendation to clinically-agreed expected outputs | Evidence recorded: inputs, outputs, expected values, pass/fail, version, date

TRACE-02 | Requirement: out-of-range or contradictory inputs are handled safely | Risk: a nonsensical input produces a misleading recommendation | Verifying test: submit out-of-range, contradictory, and missing inputs and confirm the tool refuses or warns rather than recommending | Evidence recorded: the inputs, the refusal/warning behaviour, pass/fail, version

TRACE-03 | Requirement: the control that escalates red-flag symptoms always fires | Risk: a red-flag symptom is missed | Verifying test: submit each red-flag symptom and confirm escalation is triggered every time | Evidence recorded: each red-flag case, the escalation result, pass/fail, version

TRACE-04 | Requirement: the tool only operates within its stated intended use | Risk: use outside scope (e.g. an age group it was not validated for) gives an unsafe result | Verifying test: submit out-of-scope cases and confirm the tool declines or warns per the intended-use statement | Evidence recorded: out-of-scope inputs, the decline/warn behaviour, pass/fail, version

TRACE-05 | Requirement: changes are re-verified before release | Risk: an untested change breaks a clinical function | Verifying test: after a change, re-run the linked requirement tests and a regression pass; record the results against the new version | Evidence recorded: the change, the re-run results, the version before/after, pass/fail

Strong plans: each entry links requirement → risk → test → evidence, and together they cover the core function, an out-of-range input, a risk control, the intended-use boundary, and change control. Weak plans list tests with no requirement or risk link — that is exactly the traceability gap being marked.

11 Self-Check

Click each question to reveal the answer.

Q1: What makes software “Software as a Medical Device,” and why does it matter to a tester?

Its intended use — software meant to perform a medical function on its own, such as calculating a dose or interpreting an image into a clinical recommendation. It matters because the testing bar rises from “does it work in our hands” to “can we demonstrate, with documented evidence, that it does what it is intended to do and its risks are controlled.”

Q2: Why is untraceable testing nearly worthless in regulated software?

Because if you cannot show which requirement a test proves and that it passed, an assessor treats the requirement as unverified. The testing and the documented, traceable evidence of testing are two parts of one obligation — a test with no recorded result and no link to a requirement effectively does not exist.

Q3: What is a requirements traceability matrix and what two questions does it answer?

It is the map of links between each requirement, the risk it addresses, the test that verifies it, and that test’s result. Forward it answers “for this requirement, where is the test and did it pass?” Backward it answers “for this test, which requirement and risk does it cover?” A requirement with no linked test is unverified.

Q4: Why measure coverage by requirements and risks rather than by code?

Because a high line-coverage number can still leave a safety-related requirement with no test proving it. In regulated software, every safety requirement needs a linked, passing test, so coverage is measured against requirements and identified risks — and the traceability matrix is how you demonstrate that coverage.

Q5: Why can you not ship a small change to a clinical function without re-verifying it?

Because in regulated software an untested change to a clinical function is unacceptable — even a small, seemingly unrelated change can affect a calculation or recommendation. A change triggers re-verification of the linked requirements and a regression pass, with the re-test recorded against the new version. Change control keeps the device safe after release.

12 Interview Prep

Real questions asked in NZ QA interviews for medtech and regulated-software roles. Read the model answers, then practise your own version.

“How does testing regulated medical software differ from testing an ordinary app?”

The standard shifts from “does it work” to “can we demonstrate it works.” I start from the intended-use statement, because it scopes what the software claims to do and therefore what I must prove. I test the clinical function across its full input range, the boundaries and out-of-range inputs where it must refuse rather than produce a dangerous value, and the controls for each identified risk. Crucially, I keep traceable, recorded evidence — each test linked to a requirement and a risk, with a versioned, reproducible result — because an untraceable, unrecorded test counts as an unverified requirement to an assessor. The evidence trail is part of the product.

“What is traceability and why does it matter here?”

Traceability is the unbroken chain from each requirement to the risk it addresses, to the test that verifies it, to that test’s recorded result. It matters because it answers the two questions an assessor asks: for any requirement, where is the proof it was tested and did it pass; and for any test, which requirement and risk does it cover. A requirement with no linked test is unverified; a test linked to nothing may be covering something no one needed. I reference the requirement in every test case and make sure every safety requirement has a passing test, recorded in the traceability matrix.

“A team built a dose-calculation app and wants to adopt it in a hospital. What would you flag?”

My first flag is whether they recognise it is likely Software as a Medical Device — it interprets data into a clinical decision, which raises the bar. Then I’d ask for the things an assessor will ask for and that teams often lack: a written intended-use statement, a risk analysis, and traceability from each requirement to a passing, recorded test against a specific version. I’d check boundary and out-of-range behaviour on the calculation, unit clarity on the output, and that changes are re-verified under change control. The expensive failure is discovering at adoption time that the testing was thorough but left no demonstrable evidence trail — so I’d want that built from the start.