PHI & Health-Data Privacy Testing
Health information is among the most sensitive data a person has. This lesson teaches you to test that a system collects it for the right purpose, shows it only to those allowed, records who touched it, and de-identifies it properly.
1 The Hook
A fictional NZ primary care network, Manaaki Health, ran a shared patient portal across its clinics. A receptionist at one clinic, curious about a well-known local person who had been in the news, searched the system and opened that person’s record — their diagnoses, their medications, their notes. The receptionist had no clinical reason to. They were not part of that person’s care.
The portal let them. Any staff account could open any patient’s full record across the whole network, because access had been built around “is this a valid staff login” rather than “does this person have a legitimate reason to see this patient.” Worse, the system kept no usable log of who had viewed which record, so when a complaint eventually surfaced, no one could say with certainty who had looked, or how often.
Nothing broke. The software did exactly what it allowed. But under NZ law, health information may only be accessed for a proper purpose, and a system that lets any logged-in user read any patient’s record — and cannot show who did — is failing two of the most important things a health system must do: control access and record it.
Here is the lesson hidden in that story. The team had tested that authorised staff could log in and see records — the access feature “worked.” No test asked the harder questions: can a user see a record they have no relationship to, and if they do, can we prove who looked? Privacy testing is exactly those questions. It is testing what the system refuses to do, and what it records when someone tries.
2 The Rule
Health information may only be collected for a stated purpose, accessed for a legitimate reason, and only by those entitled to see it — and every access must be recorded. A system that lets any logged-in user read any patient, or that cannot show who viewed a record, has failed privacy even if nothing crashed. Test what the system refuses, and test that it remembers who looked. Privacy is not whether access works — it is whether the wrong access is prevented and the right access is logged.
3 The Analogy
The records room at a NZ marae or a lawyer’s office.
Sensitive files are not left on an open shelf for anyone who has walked in the door. They are held securely, handed out only to a person with a genuine reason for that specific file, and a register notes who took which file and when. Having a key to the building does not entitle you to read every file in it. And if a file is ever borrowed without a reason, the register is how anyone finds out.
A health system is that records room. A valid staff login is a key to the building, not permission to open every patient’s file. Manaaki Health handed everyone a master key and kept no register, so a curious receptionist read a file they had no business in, and no one could prove it afterward. A privacy tester is the person who checks that the room only releases a file to someone with a reason for that file — and that the register reliably records every time a file is opened.
4 The Law That Applies in NZ
You do not need to be a lawyer, but you must know which rules your tests are proving. In NZ, health information is governed by two instruments working together.
The Privacy Act 2020 is the general law for personal information. It sets out information privacy principles covering how personal information is collected, used, disclosed, stored, corrected, and accessed, and it gives people the right to ask what is held about them and to have errors fixed.
The Health Information Privacy Code 2020 is the health-specific layer. It takes the Privacy Act’s principles and tailors them as a set of health information privacy rules for agencies that handle health information — covering purpose of collection, the limits on use and disclosure, security, retention, and an individual’s right of access and correction. For a tester, the Code is the more directly relevant of the two, because it speaks specifically to health records.
Conceptually, the rules a tester turns into tests are:
- Purpose: information is collected for a specific purpose, and not used or disclosed for an unrelated one without a lawful basis.
- Minimisation: only the information needed for the purpose is collected and shown.
- Access and disclosure limits: only those entitled may see it, and disclosure outside the agency is constrained.
- Security: it is protected against unauthorised access and loss.
- Right of access and correction: a person can see their own information and have errors corrected.
Stay conceptual about the exact wording of any rule — the point for testing is that these principles become concrete, checkable behaviours, which the rest of this lesson covers.
5 Consent and Purpose
Health information is collected and used on the basis of a purpose — usually the person’s care — and sometimes on the basis of consent for uses beyond that, such as sharing with another provider, or use in research. A privacy tester checks that the system honours the purpose and the consent recorded, rather than treating consent as a checkbox no code ever reads.
The tests that matter:
- Consent is recorded and enforced: if a patient has not consented to a secondary use, the system actually prevents that use — not just displays a flag everyone ignores.
- Withdrawal works: if a patient withdraws consent, the system stops the use going forward, and the withdrawal takes effect promptly.
- Purpose limits hold: data collected for care is not silently repurposed — for example, into a marketing or analytics flow — without the lawful basis for that purpose. This is the same principle that governs training an AI on health data.
- Disclosure is constrained: sharing a record with another organisation happens only where the purpose or consent allows, and the boundary is enforced in code.
6 Access Controls
Access control is the Manaaki Health failure, and it is the heart of health privacy testing. The principle is that a valid login is not blanket permission — a user should only reach the records they have a legitimate relationship with or role-based need for. Testing this means thinking about who should not see a record at least as hard as who should.
The cases a privacy tester drives:
- Role-based access: a receptionist, a nurse, a doctor, and an administrator see different things. Test that each role reaches exactly what it should and is refused what it should not — including a receptionist opening a full clinical record they have no need for.
- Relationship-based access: where the system limits access to patients a user is actually involved with, test that a user cannot open a record for a patient they have no care relationship with — the curious-receptionist case.
- The negative authorisation tests: for every “this role can do X,” there is a “this other role cannot,” and the second is the one that protects patients. A privacy test suite is heavy on refusals.
- Break-glass access: emergency override exists in many clinical systems for genuine emergencies. Test that it requires a reason, is tightly scoped, and — critically — is logged loudly so it can be reviewed afterward. Break-glass without an audit trail is just an unlocked door.
- No leakage through side channels: a record a user cannot open should not leak through search results, suggestions, error messages, URLs, or an exposed API. Access control that only lives in the UI is not access control.
7 Audit Logging
The second half of the Manaaki Health failure was the missing log. Access control prevents the wrong access; audit logging records all access so that misuse can be detected and investigated afterward. The two are a pair — a system needs both, and a privacy tester checks both.
What to test in an audit log:
- Completeness: every view, edit, and disclosure of a patient record is recorded — not just edits. The receptionist who only looked must appear in the log.
- Content: the log captures who, which patient, what action, and when, in enough detail to investigate — a who-saw-what trail.
- Integrity: the log cannot be quietly edited or deleted by the people it might incriminate, and gaps are detectable.
- Break-glass visibility: emergency overrides stand out in the log for review, rather than blending in with normal access.
- The log itself is protected: the audit trail is health-related personal information too, so it must not be casually readable by anyone — testing access control on the audit log is part of the job.
8 De-identification
Health data is often used beyond direct care — for research, planning, quality improvement, or test environments. For those uses, data is meant to be de-identified so an individual cannot reasonably be re-identified from it. Testing de-identification means trying to defeat it.
The concepts a tester checks:
- Direct identifiers removed: name, NHI, address, contact details, and other obvious identifiers are stripped or replaced, consistently, everywhere — including free-text notes, not just structured fields.
- Re-identification resistance: the harder risk is the combination of quasi-identifiers — a rare diagnosis, an exact date of birth, and a small town can together single out one person even with the name gone. Test that the combination does not re-identify.
- No leakage in test data: a classic failure is real patient data copied into a test or training environment and called “test data.” Test that lower environments do not hold real PHI, and that any data used there is genuinely synthetic or properly de-identified.
- Free text and attachments: identifiers hiding in a clinical note, a scanned document, or an image’s metadata are a common gap. Structured-field de-identification that ignores free text is incomplete.
The mindset is adversarial: do not confirm that the obvious fields were blanked — try to re-identify a person from what is left. If you can, so can someone else, and the de-identification has failed.
9 Common Mistakes
🚫 Testing that authorised access works, but not that unauthorised access is refused
Why it happens: The happy path — a doctor opens a patient they treat — is the obvious thing to demo.
The fix: Privacy lives in the refusals. Test that a receptionist cannot open a full clinical record they have no need for, and that a user cannot reach a patient they have no relationship with — the Manaaki Health case. A privacy suite is heavy on negative authorisation tests.
🚫 Treating consent as a flag the code never enforces
Why it happens: A consent checkbox is easy to build and looks like the feature is done.
The fix: A flag no code reads is not consent. Test the negative: that withdrawing consent actually stops the use, and that data collected for care cannot reach a purpose it was never consented for. Prove the refusal, not just the permission.
🚫 Forgetting to test the audit log, or only logging edits
Why it happens: Logging feels like plumbing, and edits seem more important than views.
The fix: A view can be the breach — the receptionist only looked. Test that every view, edit, and disclosure is logged with who, which patient, what, and when, that the log cannot be quietly altered, and that “who looked at me” returns a complete answer.
🚫 Using real patient data in test environments and calling it de-identified
Why it happens: Real data is convenient and feels realistic for testing.
The fix: Blanking obvious fields is not de-identification — quasi-identifiers and free text can still re-identify a person. Test adversarially: try to re-identify someone from what remains, and confirm lower environments hold only genuinely synthetic or properly de-identified data.
10 Now You Try
Three graded exercises across access, audit, and de-identification. Write your answer, run it for AI feedback, then compare to the model answer.
Read the description of a fictional NZ shared-care record system below. Identify 3 privacy risks under the Health Information Privacy Code 2020, and name the safer behaviour each needs.
Any staff member with a valid login can search for and open any patient’s full record across all participating clinics. The system records when a record is edited, but not when it is merely viewed. A patient can record a consent preference to not share their record with a particular provider, but the sharing module does not read that preference. A nightly export of records is sent to an analytics team with names removed, but date of birth, full address, and free-text notes are left intact.
List 3 privacy risks and the safer behaviour for each:
Show model answer
There are at least four real risks here; any three well-explained earns full marks. 1. Unrestricted access — any logged-in user can open any patient's full record across all clinics, with no role or relationship check. Safer behaviour: role- and relationship-based access; refuse records a user has no legitimate reason to see. Principle: access/disclosure limits (the Manaaki Health failure). 2. Views not logged — only edits are recorded, so someone who merely looks at a record leaves no trace. Safer behaviour: log every view, edit, and disclosure with who/which patient/what/when, and answer "who looked at me". Principle: security / accountability. 3. Consent preference not enforced — the patient's not-share preference exists but the sharing module ignores it. Safer behaviour: the sharing code must read and enforce the preference; withdrawing/limiting consent actually blocks the share. Principle: purpose / use and disclosure limits. Bonus: weak de-identification in the analytics export — names are removed but DOB, full address, and free-text notes remain, which can re-identify a person. Safer behaviour: remove/blur quasi-identifiers and scrub free text; test adversarially for re-identification. Principle: minimisation / security. The trap: each of these passes a "staff can log in and see records" test, because that part works.
The privacy test case below only checks that an authorised user can see a record. Rewrite it to prove access is restricted and logged, with these fields: Test ID, Privacy rule, Risk, Roles involved, Pre-conditions, Action, Expected result, Negative cases, Audit assertion, Evidence required. Use a fictional Te Whatu Ora clinical portal as the context.
“Log in as a doctor and open a patient record. Pass if the record displays.”
Rewrite as an access-control and audit test case:
Show model answer
Test ID: PRIV-ACC-022 Privacy rule: Access/disclosure limited to those with a legitimate reason; all access logged (Health Information Privacy Code 2020). Risk: Unauthorised access to a patient record; inability to show who viewed it. Roles involved: Treating doctor (has relationship), receptionist (no clinical need), and a clinician with no relationship to the patient. Pre-conditions: A test patient; three accounts with the roles above; one with a care relationship, two without. Action: Each account attempts to open the patient's full clinical record; one performs a break-glass override. Expected result: 1) The treating doctor opens the record. 2) The receptionist is refused the full clinical record they have no need for. 3) The unrelated clinician is refused. 4) Break-glass succeeds only with a recorded reason and is flagged. Negative cases: the record must not leak to the refused users via search results, URLs, or the API — not just the UI screen. Audit assertion: every attempt — successful, refused, and break-glass — is logged with who, which patient, action, and timestamp; "who viewed this patient" returns all of them; the log cannot be edited by these users. Evidence required: the access outcomes per role; the audit-log entries for each attempt; the break-glass reason record; confirmation of no side-channel leakage. What makes it strong: it tests refusals and a relationship/role model, not just one happy login; it includes break-glass with a reason; it asserts complete, tamper-resistant logging including views; and it checks side channels — none of which the original touched.
Design a de-identification test plan of 5 test cases for a fictional NZ health research export (a pipeline that produces a de-identified dataset for an approved research project). Each case needs at least: an ID, what it verifies, an acceptance criterion, and the evidence required. Cover direct identifiers, the NHI, quasi-identifier combinations, free-text notes, and the test/lower-environment leakage case.
Show model answer
DEID-01 | Verifies: direct identifiers are removed everywhere | Acceptance criteria: name, address, contact details, dates of contact are removed/replaced in all structured fields; 0 direct identifiers remain | Evidence required: field scan of the export; before/after sample records DEID-02 | Verifies: the NHI is removed or irreversibly pseudonymised | Acceptance criteria: no raw NHI appears; any linkage key cannot be reversed to the NHI without separately controlled access | Evidence required: search of the export for NHI patterns; description of the pseudonymisation; confirmation the mapping is held separately DEID-03 | Verifies: quasi-identifier combinations do not re-identify | Acceptance criteria: combinations like exact DOB + rare diagnosis + small locality cannot single out an individual (e.g. dates generalised, small cells suppressed/grouped) | Evidence required: re-identification attempt results; the generalisation/suppression rules applied DEID-04 | Verifies: free-text notes are de-identified | Acceptance criteria: identifiers embedded in clinical notes (names, NHIs, addresses) are detected and removed; structured-only scrubbing is not accepted | Evidence required: sample notes before/after; the free-text scrubbing method and its results DEID-05 | Verifies: no real PHI lands in test/lower environments | Acceptance criteria: lower environments contain only synthetic or properly de-identified data; 0 real patient records present | Evidence required: scan of the lower environment; data-provenance record showing the source is synthetic/de-identified Strong plans: each case is specific, has a measurable criterion, names concrete evidence, and together they cover direct identifiers, the NHI, quasi-identifier combinations, free text, and environment leakage. Weak plans say "check the data is de-identified" five times — that is the difference being marked.
11 Self-Check
Click each question to reveal the answer.
Q1: Why is “authorised access works” not enough to prove privacy?
Because privacy lives in the refusals. A valid login is a key to the building, not permission to open every file. You must test that a user cannot reach a record they have no legitimate reason or relationship to see — the Manaaki Health failure — and that the wrong access is prevented, not just that the right access is allowed.
Q2: Which two NZ instruments govern health information, and which is more directly relevant?
The Privacy Act 2020 is the general law for personal information; the Health Information Privacy Code 2020 is the health-specific layer that tailors those principles into rules for agencies handling health information. For a tester the Code is more directly relevant, because it speaks specifically to health records, access, and correction.
Q3: Why must an audit log capture views and not only edits?
Because a view can be the breach — the curious receptionist who only looked at a record committed a privacy breach without changing anything. The log must record every view, edit, and disclosure with who, which patient, what, and when, so that “who looked at me” returns a complete, trustworthy answer.
Q4: What is the strongest way to test consent enforcement?
The negative test. Anyone can show an allowed use proceeds; the privacy value is proving a disallowed use is actually blocked — that withdrawing consent stops the flow going forward, and that data collected for care cannot reach a purpose it was never consented for. Test the refusal, not just the permission.
Q5: Why is blanking obvious fields not enough for de-identification?
Because quasi-identifiers in combination — an exact date of birth, a rare diagnosis, and a small town — can single out a person even with the name removed, and identifiers often hide in free-text notes and attachments. Test adversarially: try to re-identify someone from what remains. If you can, the de-identification has failed.
12 Interview Prep
Real questions asked in NZ QA interviews for health privacy roles. Read the model answers, then practise your own version.
“How would you test access control on a clinical record system?”
I’d test the refusals as hard as the permissions. A valid login is not blanket access, so for every role — receptionist, nurse, doctor, admin — I check it reaches exactly what it should and is refused what it should not, including a receptionist opening a full clinical record they have no need for. Where access is relationship-based, I confirm a user cannot open a patient they have no care relationship with. I test break-glass requires a recorded reason and is flagged, and I check the record does not leak through search, URLs, or the API — access control that only lives in the UI is not access control. And I pair every access test with an audit assertion, because preventing the wrong access and recording all access are two halves of the same job.
“A patient asks who has viewed their record. How does that shape your testing?”
That question is one of my best audit tests, because the system has to produce a complete, trustworthy answer. I drive testing backward from it: every view, edit, and disclosure must be logged with who, which patient, what action, and when — views especially, because a view can be the breach. I check the log cannot be quietly edited or deleted by the people it might incriminate, that break-glass access stands out for review, and that the audit trail itself is access-controlled, since it is sensitive health-related information too. If the system cannot answer “who looked at me” completely, the audit logging has failed.
“How do you test that data is properly de-identified for research?”
Adversarially — my goal is to re-identify a person, not to confirm the obvious fields were blanked. I check direct identifiers including the NHI are removed everywhere, including free-text notes and attachments, not just structured fields. Then I attack the quasi-identifiers: whether an exact date of birth, a rare diagnosis, and a small locality together single someone out, and whether dates are generalised and small cells suppressed. I also confirm lower and test environments hold only synthetic or properly de-identified data, because copying real patient data into a test environment and calling it de-identified is a classic failure. If I can re-identify someone from what remains, so can someone else, and it has not been de-identified.