Health Information Security (HISF)
Health information is among the most sensitive data a system can hold, and the moment it is copied into a test environment, every duty to protect it travels with it. This lesson teaches you to protect health information in non-prod to the standard Te Whatu Ora expects.
1 The Hook
A vendor was building a new appointment-booking system for a fictional regional health provider. To make testing realistic, the delivery team did the obvious thing: they took a recent copy of the live patient database and loaded it into their test environment. Real names, real dates of birth, real NHI numbers, real diagnoses, real appointment notes. The tests ran beautifully — the data behaved exactly like production, because it was production.
The test environment, though, was not built like production. It sat in a shared development cloud account. Access was not locked down — developers, testers, and a couple of contractors all had logins, because it was “just test.” There was no monitoring, because no one monitors a test box. And a snapshot of that environment was copied to a laptop so someone could keep working over a long weekend.
Nothing dramatic happened — this time. But walk through what was true the whole time the project ran: the full clinical history of thousands of real patients, including mental-health and other sensitive notes, was sitting in an unmonitored environment that several people who had no clinical reason to see it could open, and a copy was on a personal laptop. If any of those access points had been compromised, this would have been one of the most serious health-information breaches the provider could suffer — and it would have started not in the live system, but in test.
Here is the lesson hidden in that story: the protection a tester owes health information does not stop at the production boundary. The Health Information Privacy Code and the Health Information Security Framework apply to that data wherever it lives — and a test environment full of real patient records is one of the most overlooked risks in health software. This lesson is about closing that gap.
2 The Rule
Health information stays health information when it is copied into test. The Health Information Privacy Code 2020 and the Health Information Security Framework do not stop at the production boundary — a test environment holding real patient records carries the same duty to protect them, and usually carries weaker controls. The safest position is that real clinical data does not belong in test at all; where it must be there, it is de-identified, minimised, access-controlled, kept onshore, and disposed of — and every one of those is something a tester can verify.
3 The Analogy
Patient files left in an unlocked back room.
A clinic keeps its patient files in a locked records room with a sign-in sheet and a camera. No one would question that. But then, to get a renovation done, someone photocopies the lot and leaves the copies in an unlocked back room that the builders, the cleaners, and a couple of temps all walk through — because “they’re only copies.” The information in those copies is exactly as sensitive as the originals. The protection did not follow the data when it moved.
A test environment full of real patient records is that unlocked back room. The data is just as sensitive as it is in production, but the controls around it are usually thinner — broader access, no monitoring, copies on laptops. The fix is the same as in the clinic: do not make the copies if you do not have to, and if you do, lock the room exactly as well as the original.
4 What the HISF Is
The Health Information Security Framework (HISF) is NZ’s framework for how health organisations protect the information they hold. It sets the expectations for managing health information securely across people, processes, and technology — covering things like access control, data handling, network and system security, monitoring, and incident response, written for the health context. For a tester, the value of the HISF is that it gives you a recognised set of expectations to test against rather than inventing your own.
The HISF does not exist in isolation. It sits alongside the broader NZ security expectations a tester already meets in government work — the New Zealand Information Security Manual (NZISM) sets the wider public-sector security baseline, and the next lesson covers the NZISM controls a tester verifies in test environments. The HISF is the health-specific layer on top of that, reflecting how serious the consequences are when health information is exposed.
Data handling — health information is stored, moved, and disposed of securely. For a tester: check residency, encryption in transit and at rest, and that copies are not left where they should not be.
Monitoring and audit — access to health information is logged and reviewable. For a tester: confirm that who-saw-what can actually be reconstructed — including in non-prod, which is usually where this is missing.
Incident response — there is a defined way to detect and respond to a breach. For a tester: a test environment exposure is an incident; check it would be caught and handled, not silently ignored.
You do not need to memorise the framework. You need to know it exists, that it applies to health information wherever that information lives, and that it gives you concrete expectations — access, handling, monitoring, response — you can turn into checks.
5 The Health Information Privacy Code 2020
The Health Information Privacy Code 2020 is a code issued under the Privacy Act 2020 that applies specifically to health information held by health agencies. It takes the general privacy principles and tailors them to health — with its own set of health information privacy rules covering collection, use, disclosure, storage, security, and access. For a tester in health software, the Code is the privacy rulebook that sits closest to the data you are working with.
Several of its rules bear directly on testing:
- Security of health information: the agency must protect health information against loss, misuse, and unauthorised access or disclosure with reasonable safeguards. A test environment with real patient data and weak controls is a direct failure of this rule.
- Limits on use: health information collected for care should not simply be repurposed for unrelated uses. Using real clinical records to test an unrelated system can stray beyond the purpose it was collected for.
- Retention and disposal: health information should not be kept longer than needed and must be disposed of securely. Old test data snapshots full of patient records sitting around indefinitely breach this.
- Access and correction: individuals have rights over their health information, which assumes the agency actually knows where all copies of it are — hard to honour if copies are scattered across test environments and laptops.
The Code and the HISF reinforce each other: the Code sets the legal obligation to protect health information, and the HISF describes how to meet it. A tester who understands both can tell not only that a control is missing but which obligation that gap breaches.
6 Health Information in Test Environments — What Te Whatu Ora Expects
Te Whatu Ora — Health New Zealand — is the national health organisation, and it sets the expectations vendors and delivery teams are held to when building health systems. The consistent expectation, across the HISF and the Code, is that real patient information is protected to the same standard wherever it sits, and that test environments are not a loophole. In practice this comes down to a handful of things a tester can check.
The expectations, as testable checks
- Prefer no real data in test: the default expectation is synthetic or de-identified data. The first check is whether real clinical data needs to be there at all — often it does not.
- If real data must be used, it is minimised and de-identified: only the fields a test genuinely needs, with identifiers removed or masked — covered in the next section.
- Equal controls in non-prod: access control, encryption, residency, and monitoring in the test environment match the production standard, not a relaxed “it’s only test” version.
- Onshore residency: health information stays within approved NZ jurisdictions in test as well as production, unless a specific approval says otherwise.
- No uncontrolled copies: no snapshots on laptops, no shares to personal accounts, no copies that no one is tracking.
- Defined disposal: test data has an owner and an end date, and is securely destroyed when the testing is done.
The thread running through all of these: a test environment holding health information is held to the production standard. “It’s only test” is the phrase that precedes most health-data exposures, and it is not a defence the HISF or the Code recognises.
7 De-identifying Clinical Test Data
When a test genuinely needs realistic data, de-identification is how you keep the data useful while removing the people from it. Done properly it is the difference between a safe test dataset and a breach waiting to happen. Done carelessly it gives false comfort — data that looks de-identified but can be re-linked to real patients.
What de-identification has to remove
- Direct identifiers: name, NHI number, address, phone, email, date of birth — the fields that name a person outright. These are removed, masked, or replaced with realistic fakes.
- Indirect identifiers (the hard part): combinations that re-identify someone even with names gone — a rare diagnosis plus an age plus a small town can point to one person. De-identification has to consider these combinations, not just the obvious fields.
- Free-text leakage: clinical notes are full of names, places, and detail in prose. Masking the structured fields while leaving the notes untouched is one of the most common de-identification failures.
How a tester verifies de-identification
De-identification is itself something to test, not trust. Sample the supposedly de-identified dataset and check that direct identifiers really are gone from every field, including free-text notes. Look for re-identification risk in combinations of quasi-identifiers. Confirm that the mapping back to real identities, if one exists, is held separately and securely — or, better, that there is no reversible mapping at all. And check that the de-identified data is still realistic enough that the test is meaningful, because a dataset that is safe but unrealistic gets quietly swapped back for the real thing.
8 Building HISF Test Cases
A health-information-security test case targets a control or a data-handling expectation rather than a function. The system under test is often the test environment itself or the de-identification process, the acceptance criterion is a control state or a measured absence of identifiers, and the evidence is a record an auditor — or Te Whatu Ora — could inspect.
Here is a de-identification test case for the appointment-booking system from the opening story:
Obligation: HISF data handling; Health Information Privacy Code (security)
Type: De-identification verification
Description: Verify the test dataset for the booking system contains no direct
patient identifiers in any field, including free-text clinical notes.
Acceptance criteria: Across a sample of 200 records, 0 direct identifiers (name, NHI,
DOB, address, phone) appear in any structured field or in notes; no
quasi-identifier combination resolves to a single real patient.
Evidence required: Sampled-record scan results; the notes-scan method/pattern used;
re-identification-risk assessment of quasi-identifiers; reviewer sign-off.
Traceability: Risk R-03 (real patient data exposed in test environment) in the
project risk register.
Result: [Pass / Fail] — any identifier or re-identifiable record listed.
Notice the shape: the acceptance criterion is a measured absence (zero identifiers across a defined sample, including notes), not “the data was de-identified”; the evidence is reproducible and names the method, including the easily-skipped notes scan; and the case traces to a named risk. Those properties make it a real control test rather than an assurance that someone did the right thing.
9 Common Mistakes
🚫 Copying production patient data into test because “it’s only test”
Why it happens: Real data makes tests realistic, and test environments feel low-stakes.
The fix: Health information is just as sensitive in test, and test environments usually have weaker controls — making them a bigger risk, not a smaller one. Default to synthetic or de-identified data, and treat real clinical data in non-prod as a breach to be closed.
🚫 Masking the structured fields and leaving the clinical notes untouched
Why it happens: Structured fields are easy to mask; free-text notes are awkward, so they get skipped.
The fix: Notes are full of names, places, NHI numbers, and identifying detail in prose. A dataset is not de-identified until the notes are too. Always sample the notes specifically when you verify de-identification.
🚫 Treating direct identifiers as the whole job and ignoring re-identification
Why it happens: Removing names and NHI numbers feels like “done.”
The fix: A rare diagnosis plus an age plus a small location can identify one person with no name at all. De-identification has to assess combinations of quasi-identifiers, not just strip the obvious fields.
🚫 Letting test environments run with weaker controls than production
Why it happens: Non-prod is seen as throwaway, so access is broad, monitoring is off, and snapshots travel onto laptops.
The fix: If a test environment holds health information, hold it to the production standard — restricted auditable access, encryption, onshore residency, monitoring, no uncontrolled copies, and defined disposal. Te Whatu Ora’s expectation does not relax for non-prod.
10 Now You Try
Three graded exercises — spot the risk, fix the practice, then design the checks. Write your answer, run it for AI feedback, then compare to the model answer.
Read the description of a test setup for a fictional community mental-health records system below. Identify 3 health-information-security risks and name the HISF expectation or Health Information Privacy Code rule each one breaches.
To test a new clinician portal, the team restored a full copy of the live database into a test environment. It includes patient names, NHI numbers, dates of birth, diagnoses, and free-text consultation notes. The structured fields were “anonymised” by blanking the name column, but the consultation notes were left exactly as written. The test environment is in a shared cloud account, and everyone on the delivery team — developers, testers, and three offshore contractors — has access. There is no logging of who opens records. A copy of the dataset was taken six months ago for an earlier test cycle and is still on a shared drive. Part of the cloud account is hosted in an overseas region.
List 3 risks and the HISF expectation or Code rule each breaches:
Show model answer
There are at least five real risks here; any three well-explained earns full marks. 1. Failed de-identification — Blanking the name column while leaving consultation notes untouched is not de-identification. The notes contain names, places, NHI numbers, and identifying detail in prose, and the structured fields still hold NHI, DOB, and diagnoses. Breaches: HISF data handling; Health Information Privacy Code security rule. 2. Excessive and unmonitored access — Developers, testers, and offshore contractors all have access, with no logging of who opens records. There is no legitimate-need restriction and no audit trail. Breaches: HISF access control and monitoring; Code security rule. 3. Uncontrolled copy retained — A six-month-old dataset copy still sits on a shared drive with no owner or disposal. Breaches: HISF data handling; Code retention/disposal rule. Bonus risks: Offshore residency — part of the account is in an overseas region, breaching onshore-residency expectations for health information. Real data in test at all — the default expectation is synthetic or de-identified data; restoring the full live database fails that before any other control. The trap: the team believed they had "anonymised" the data by blanking one column. The most sensitive content — mental-health consultation notes tied to real people — was never touched, and this exposure started entirely in a test environment.
A team building a fictional hospital discharge-summary system uses the test-data practice below. Rewrite it into a compliant approach that meets HISF and Te Whatu Ora expectations, covering whether real data is needed, de-identification (including notes), access, residency, and disposal. Explain what you changed and why.
“We clone production every sprint into a test environment so the data is always fresh. It has all patient fields and full discharge notes. We remove the surname column and call it anonymised. The environment is open to the whole team and hosted wherever capacity is cheapest. We keep each clone until the next one replaces it, and sometimes longer if a tester is still using it.”
Rewrite as a compliant test-data approach:
Show model answer
A compliant approach addresses every dimension the original ignored. - Do we need real data at all: Start by defaulting to synthetic or de-identified data. Most discharge-system tests do not need real patients; cloning production every sprint is the root problem. - De-identification (incl. notes): If realistic data is genuinely needed, de-identify properly — remove or mask all direct identifiers in every structured field AND scan and clean the free-text discharge notes, which carry names, places, and NHI numbers in prose. Blanking the surname is not de-identification. Assess quasi-identifier combinations for re-identification risk. - Minimisation: Include only the fields and the volume a test actually needs, not the full production set every sprint. - Access control: Restrict to named people with a legitimate need, with auditable logging of who opens records — not "the whole team," and not offshore contractors by default. - Residency: Host within approved NZ jurisdictions. "Wherever capacity is cheapest" is not acceptable for health information; onshore residency applies in test too. - Retention / disposal: Give each dataset an owner and an end date, and securely destroy it when the test cycle ends. No "sometimes longer," no orphaned clones. What changed and why: the original treated a test environment as exempt from health-information duties — fresh production clones, a token "anonymisation," open access, cheapest-region hosting, and indefinite retention. The compliant version applies the HISF and the Health Information Privacy Code to non-prod at the production standard, which is exactly what Te Whatu Ora expects.
Design a set of 5 HISF test cases for the test environment of a fictional maternity-care records system. Each case should have at least an ID, the HISF expectation or Code rule it addresses, what it verifies, a measurable acceptance criterion, and the evidence required. Cover de-identification (incl. notes), access control, monitoring, residency, and disposal.
Show model answer
HISF-01 | Addresses: data handling / Code security (de-identification) | Verifies: no direct identifiers in any field, including free-text notes | Acceptance criteria: across a 200-record sample, 0 direct identifiers in structured fields or notes; no quasi-identifier combination resolves to one real patient | Evidence required: sampled-record scan results; notes-scan method; re-identification-risk assessment; reviewer sign-off HISF-02 | Addresses: access control | Verifies: only named people with a legitimate need can access the test environment | Acceptance criteria: access list matches the approved named list; 0 accounts without a recorded legitimate need; no default offshore-contractor access | Evidence required: access-list review against the approved list; reviewer sign-off HISF-03 | Addresses: monitoring / audit | Verifies: access to records in the test environment is logged and reconstructable | Acceptance criteria: who-opened-what can be reconstructed for a sample of 10 access events; logging is enabled and retained | Evidence required: log extract for the sampled events; logging-config confirmation HISF-04 | Addresses: data handling (residency) | Verifies: all test health data is stored and processed in approved NZ jurisdictions | Acceptance criteria: 100% of test data resides onshore in approved regions; 0 components hosted offshore without approval | Evidence required: residency attestation; hosting-region listing for every component HISF-05 | Addresses: retention / disposal | Verifies: test datasets have an owner and end date and are securely destroyed | Acceptance criteria: every test dataset has a named owner and disposal date; 0 orphaned or expired copies remain | Evidence required: dataset register with owners and dates; disposal evidence for the most recent expired set Strong plans: each case is specific, has a measurable criterion, names concrete evidence, and together they cover de-identification incl. notes (HISF-01), access (HISF-02), monitoring (HISF-03), residency (HISF-04), and disposal (HISF-05). Weak plans say "make sure the data is secure" five times without a measurable criterion — that is the difference being marked.
11 Self-Check
Click each question to reveal the answer.
Q1: Why is a test environment full of real patient data often a bigger risk than production?
Because the data is exactly as sensitive as it is in production, but the controls around it are usually weaker — broader access, no monitoring, snapshots on laptops, sometimes offshore hosting. The HISF and the Health Information Privacy Code do not stop at the production boundary, so a test environment with weak controls is a direct breach. “It’s only test” is not a defence either recognises.
Q2: What is the relationship between the HISF and the Health Information Privacy Code 2020?
The Health Information Privacy Code sets the legal obligation to protect health information — its health information privacy rules cover collection, use, disclosure, security, retention, and access. The HISF describes how a health organisation meets that obligation in practice, across access control, data handling, monitoring, and incident response. The Code says what must be protected; the HISF says how.
Q3: Why is blanking the name column not de-identification?
Because identifiers survive elsewhere. Structured fields still hold NHI numbers, dates of birth, and diagnoses, and free-text clinical notes are full of names, places, and identifying detail in prose. On top of that, combinations of quasi-identifiers — a rare diagnosis plus an age plus a small location — can identify a person with no name at all. Real de-identification covers every field, the notes, and re-identification risk.
Q4: What does Te Whatu Ora expect for health information in test environments?
That real patient data is protected to the production standard wherever it sits. In practice: prefer synthetic or de-identified data; if real data must be used, minimise and de-identify it; apply equal access control, encryption, residency, and monitoring in non-prod; keep it onshore; allow no uncontrolled copies; and dispose of it on a defined timeline. Test environments are not treated as a loophole.
Q5: What is the most commonly skipped de-identification check, and why does it matter?
Scanning the free-text clinical notes. Structured fields are easy to mask and usually get masked, but notes are awkward, so they are left untouched — and they are exactly where real patient names, NHI numbers, addresses, and identifying detail survive a “de-identification” that everyone signed off as complete. Always sample the notes specifically.
12 Interview Prep
Real questions asked in NZ QA interviews for health-sector roles. Read the model answers, then practise your own version.
“A team wants to copy production patient data into a test environment to make testing realistic. How do you respond?”
I’d push back and start from the default position that real clinical data does not belong in test. Health information is just as sensitive in non-prod, and test environments usually have weaker controls — broader access, no monitoring, copies that wander onto laptops — which makes them a bigger risk, not a smaller one. I’d ask what the test actually needs and whether synthetic or de-identified data covers it, which it usually does. If realistic data is genuinely required, I’d require proper de-identification including the free-text notes, minimisation, production-standard access and monitoring, onshore residency, and defined disposal — and I’d frame leaving real data exposed in test as a breach of both the HISF and the Health Information Privacy Code.
“How would you verify that a clinical test dataset is properly de-identified?”
I’d treat de-identification as something to test, not trust. I’d sample the dataset and check that direct identifiers — name, NHI, DOB, address, phone — are gone from every structured field, and crucially I’d scan the free-text clinical notes, because that is where identities usually survive. I’d assess re-identification risk in combinations of quasi-identifiers, like a rare diagnosis plus an age plus a small location. I’d confirm there is no reversible mapping back to real patients sitting alongside the data, and I’d check the data is still realistic enough that the test is meaningful. The evidence is the sample scan results, the notes-scan method, and a re-identification assessment — not a sentence saying it was anonymised.
“What does the Health Information Privacy Code mean for how you test a health system?”
It is the privacy rulebook closest to the data, and several of its rules land directly on testing. The security rule means any environment holding health information — including test — must have reasonable safeguards, so weak controls in non-prod are a direct failure. The limits on use mean real clinical records collected for care should not be casually repurposed to test an unrelated system. The retention and disposal rule means old test-data copies cannot sit around indefinitely. And the access rights assume the agency knows where every copy of the data is, which is hard if copies are scattered across test environments. So in practice the Code turns “protect health data” into specific, testable obligations that apply wherever the data lives.