Lesson 3 of 3 · Environment & Release Management

Test Data in Environments

Using real customer data in test environments is not just risky — in NZ, it may breach the Privacy Act 2020. This lesson covers PII masking, synthetic data generation, and the governance framework that keeps test data safe and useful.

Environment & Release CTAL-TM v3.0 — Section 3.2.4 ~30 min read · ~70 min with exercises

1 The Hook

An Inland Revenue contractor discovers that the SIT environment contains a full copy of production taxpayer data — names, IRD numbers, addresses, and income figures — from a database refresh 8 months ago. No one masked it. The data has been accessed by 14 developers and 6 contractors without individual consent.

Under the NZ Privacy Act 2020, this is a notifiable privacy breach. The agency must notify the Office of the Privacy Commissioner and the affected individuals.

Cost: $340,000 in legal fees, remediation, and staff time. Root cause: no test data management policy.

No one was malicious. No one thought about it. That is exactly the problem a test data policy solves.

2 The Rule

Production data must never appear in non-production environments without masking. If you cannot mask it, synthesise it. There is no legitimate testing reason that requires real customer PII in a SIT or UAT environment.

3 The Analogy

Analogy

Photocopying a passport for hotel check-in.

The hotel needs to verify your identity — but they do not need your actual passport number stored in their filing system. They redact it before filing. The copy is useful (it shows you have a passport) but not sensitive (the number is gone).

Masked test data works the same way. The record is structurally realistic enough to test with, but it cannot be used to identify a real person. The utility is preserved. The risk is eliminated.

4 Watch Me Do It

Three tools for safe test data: masking queries, synthetic generation, and a governance policy.

1. PII masking query (run against a prod clone before loading into SIT)

-- Masking query for customer table -- Run against a production clone before any SIT or UAT use UPDATE customers SET first_name = CONCAT('Test_', id), last_name = 'User', email = CONCAT('test.user.', id, '@resync-test.nz'), ird_number = CONCAT('000-', LPAD(id, 3, '0'), '-000'), phone = CONCAT('021 000 ', LPAD(id % 1000, 4, '0')), address = CONCAT(id, ' Test Street, Wellington'), date_of_birth = '1990-01-01' WHERE 1=1; -- Apply to every row in the non-production clone

2. Synthetic data generation with Faker.js (for NZ locale)

import { faker } from '@faker-js/faker/locale/en_NZ'; const testCustomer = { firstName: faker.person.firstName(), lastName: faker.person.lastName(), email: faker.internet.email({ provider: 'resync-test.nz' }), phone: faker.phone.number('021 ### ####'), address: faker.location.streetAddress() + ', Wellington', irdNumber: generateValidIRD(), // Custom function with valid checksum };

3. Test data governance policy elements (what your policy must cover)

  • Data classification: which fields are PII under the Privacy Act 2020 (names, IRD numbers, NHI numbers, contact details, financial data)
  • Refresh schedule: when production clones are refreshed and masked — monthly for SIT, fortnightly for UAT
  • Access controls: who can access non-production databases and under what conditions
  • Data retention: when non-production data is purged — recommend 90 days maximum
  • Audit trail: who accessed what data and when (mandatory for notifiable breach investigation)
  • Breach procedure: what happens when unmasked PII is found in a test environment
Pro tip: If a developer says they need real data to reproduce a production bug, the correct answer is “no.” The correct process is: get the bug ID from the production incident, reproduce it using a synthetic record with the same structural characteristics, then verify the fix in SIT with masked data. Real PII is never the only way to reproduce a defect.

5 When to Use It

Always. For every non-production environment. The Privacy Act 2020 applies regardless of whether a breach was intentional. “We forgot to mask it” and “we didn’t think it was a breach” are not defences under the Act — they are aggravating factors when the Privacy Commissioner investigates.

The mask-or-synthesise principle applies to: SIT, UAT, staging, performance test environments, developer workstations, and any CI/CD environment where a database is seeded from production.

6 Common Mistakes

🚫 “I used to think: test data is not real data so privacy rules don’t apply.”

Actually: If real customer data is used in testing — masked or not — privacy obligations apply from the moment you copy it. The Privacy Act 2020 applies to any personal information held by an agency, regardless of its intended use. Copying production data to SIT without a lawful purpose and without consent triggers the Act.

🚫 “I used to think: masking email addresses is enough.”

Actually: Re-identification risk means that even with email masked, a combination of name + IRD number + address can uniquely identify a real person. The Privacy Act 2020 Principle 5 (storage and security) and Principle 11 (disclosure) apply to any data that could be used to identify someone — mask all PII fields, not just the obvious ones.

🚫 “I used to think: the development team is responsible for test data.”

Actually: QA owns the test data strategy. Developers create the software; QA defines what data is needed to test it and ensures that data is compliant. If a developer refreshes a database without masking, QA is the team with the policy, the process, and the obligation to catch it.

7 Now You Try

🏥 Prompt Lab — Write a Test Data Management Policy

Write a test data management policy for a NZ healthcare organisation’s test environments. Cover: which fields must be masked (patient names, NHI numbers, diagnoses, GP details), the masking approach, the data refresh schedule, and how you’d handle a case where developers say they need real data to reproduce a bug.

8 Self-Check

Click each question to reveal the answer.

Q1: At what point does the NZ Privacy Act 2020 apply to test data?

The moment production data is copied to a non-production environment, regardless of whether it has been masked or whether the intent is testing. Copying personal information creates a privacy obligation. The Act does not have an exemption for test environments.

Q2: Why is masking only the email address insufficient?

Re-identification risk. A combination of fields — name, address, IRD number, date of birth — can uniquely identify a real person even with the email removed. All PII fields must be masked to eliminate re-identification risk, not just the most obvious one.

Q3: What is the correct response when a developer requests real production data to reproduce a bug?

No. Reproduce the bug using a synthetic record with the same structural characteristics as the production case. Real PII is never the only way to reproduce a defect. Document the structural characteristics (e.g., account created before 2023, IRD number with specific checksum pattern) and create a synthetic record that matches.

Q4: Who is responsible for the test data strategy?

QA. Not the development team, not the database administrator. QA defines what data is needed for testing, specifies the masking requirements, and enforces the policy. If a developer refreshes without masking, QA owns the process that should have caught it.

Q5: What five elements must a test data governance policy cover?

Data classification (which fields are PII), refresh schedule (when clones are refreshed and masked), access controls (who can access non-production databases), data retention (when non-production data is purged), and audit trail (who accessed what data and when).

9 ISTQB Mapping

CTAL-TM v3.0 — Section 3.2.4: Test Data Management

The ISTQB Test Manager syllabus addresses test data management as part of environment planning, covering data creation, data storage, data security, and the privacy obligations that arise when personal information is used in testing. This lesson applies those concepts specifically to the NZ Privacy Act 2020 and the notifiable breach framework that the Privacy Commissioner administers.