Lesson 3 of 3 · Environment & Release Management

Test Data in Environments

Using real customer data in test environments is not just risky — in NZ, it may breach the Privacy Act 2020. This lesson covers PII masking, synthetic data generation, and the governance framework that keeps test data safe and useful.

Environment & Release CTAL-TM v3.0 — Section 3.2.4 ~30 min read · ~70 min with exercises

1 The Hook

An Revenue NZ contractor discovers that the SIT environment contains a full copy of production taxpayer data — names, Revenue NZ numbers, addresses, and income figures — from a database refresh 8 months ago. No one masked it. The data has been accessed by 14 developers and 6 contractors without individual consent.

Under the NZ Privacy Act 2020, this is a notifiable privacy breach. The agency must notify the Office of the Privacy Commissioner and the affected individuals.

Cost: $340,000 in legal fees, remediation, and staff time. Root cause: no test data management policy.

No one was malicious. No one thought about it. That is exactly the problem a test data policy solves.

2 The Rule

Production data must never appear in non-production environments without masking. If you cannot mask it, synthesise it. There is no legitimate testing reason that requires real customer PII in a SIT or UAT environment.

3 The Analogy

Analogy

Photocopying a passport for hotel check-in.

The hotel needs to verify your identity — but they do not need your actual passport number stored in their filing system. They redact it before filing. The copy is useful (it shows you have a passport) but not sensitive (the number is gone).

Masked test data works the same way. The record is structurally realistic enough to test with, but it cannot be used to identify a real person. The utility is preserved. The risk is eliminated.

Senior engineer insight

The worst test data breach I ever investigated wasn't a hack — it was a scheduled job that nobody audited. Every fortnight it refreshed the SIT database from production, and for eleven months nobody noticed the masking step had silently failed after a schema change added a new PII column. The automation looked fine; the data wasn't. After that, we built a masking verification check into every refresh pipeline: query the masked environment for any field matching real Revenue NZ number patterns, and fail the pipeline if it finds one.

The most common mistake: teams run a masking script once, declare it done, then let the schema drift for months until new PII columns appear unmasked. Masking is not a one-time task — it is a recurring verification discipline.

4 Watch Me Do It

Three tools for safe test data: masking queries, synthetic generation, and a governance policy.

1. PII masking query (run against a prod clone before loading into SIT)

-- Masking query for customer table
-- Run against a production clone before any SIT or UAT use
UPDATE customers SET
  first_name  = CONCAT('Test_', id),
  last_name   = 'User',
  email       = CONCAT('test.user.', id, '@resync-test.nz'),
  ird_number  = CONCAT('000-', LPAD(id, 3, '0'), '-000'),
  phone       = CONCAT('021 000 ', LPAD(id % 1000, 4, '0')),
  address     = CONCAT(id, ' Test Street, Wellington'),
  date_of_birth = '1990-01-01'
WHERE 1=1;  -- Apply to every row in the non-production clone

2. Synthetic data generation with Faker.js (for NZ locale)

import { faker } from '@faker-js/faker/locale/en_NZ';

const testCustomer = {
  firstName:  faker.person.firstName(),
  lastName:   faker.person.lastName(),
  email:      faker.internet.email({ provider: 'resync-test.nz' }),
  phone:      faker.phone.number('021 ### ####'),
  address:    faker.location.streetAddress() + ', Wellington',
  irdNumber:  generateValidIRD(),  // Custom function with valid checksum
};

3. Test data governance policy elements (what your policy must cover)

Data classification: which fields are PII under the Privacy Act 2020 (names, Revenue NZ numbers, NHI numbers, contact details, financial data)
Refresh schedule: when production clones are refreshed and masked — monthly for SIT, fortnightly for UAT
Access controls: who can access non-production databases and under what conditions
Data retention: when non-production data is purged — recommend 90 days maximum
Audit trail: who accessed what data and when (mandatory for notifiable breach investigation)
Breach procedure: what happens when unmasked PII is found in a test environment

Pro tip: If a developer says they need real data to reproduce a production bug, the correct answer is “no.” The correct process is: get the bug ID from the production incident, reproduce it using a synthetic record with the same structural characteristics, then verify the fix in SIT with masked data. Real PII is never the only way to reproduce a defect.

From the field

A Wellington fintech team assumed their UAT environment was safe because the developer who built the refresh script had "definitely masked everything." Six months later, a QA engineer running exploratory testing noticed that the beneficiary_name and bank_account_number fields on payment records were unmasked — the script predated the payments module and had never been updated. Under the NZ Privacy Act 2020, bank account numbers associated with a named individual are personal information, and the team had inadvertently given UAT access to contractors outside their data processing agreement. The incident triggered a voluntary disclosure to the Office of the Privacy Commissioner and a full audit of every non-production environment. The lesson that generalises: your masking script is only as complete as the last time someone reviewed it against the current schema — schedule a masking audit every time a new module ships.

5 When to Use It

Always. For every non-production environment. The Privacy Act 2020 applies regardless of whether a breach was intentional. “We forgot to mask it” and “we didn’t think it was a breach” are not defences under the Act — they are aggravating factors when the Privacy Commissioner investigates.

The mask-or-synthesise principle applies to: SIT, UAT, staging, performance test environments, developer workstations, and any CI/CD environment where a database is seeded from production.

6 Common Mistakes

🚫 “I used to think: test data is not real data so privacy rules don’t apply.”

Actually: If real customer data is used in testing — masked or not — privacy obligations apply from the moment you copy it. The Privacy Act 2020 applies to any personal information held by an agency, regardless of its intended use. Copying production data to SIT without a lawful purpose and without consent triggers the Act.

🚫 “I used to think: masking email addresses is enough.”

Actually: Re-identification risk means that even with email masked, a combination of name + Revenue NZ number + address can uniquely identify a real person. The Privacy Act 2020 Principle 5 (storage and security) and Principle 11 (disclosure) apply to any data that could be used to identify someone — mask all PII fields, not just the obvious ones.

🚫 “I used to think: the development team is responsible for test data.”

Actually: QA owns the test data strategy. Developers create the software; QA defines what data is needed to test it and ensures that data is compliant. If a developer refreshes a database without masking, QA is the team with the policy, the process, and the obligation to catch it.

Why teams fail here

Masking scripts are written once and never reconciled against schema changes — new PII columns added to tables go unmasked for months until someone notices
Partial masking that addresses email and name but overlooks Revenue NZ numbers, NHI numbers, bank account numbers, or CoverNZ claim IDs — re-identification risk remains even when the "obvious" fields are masked
No ownership boundary: the database administrator refreshes without masking because they assume QA will do it, while QA assumes the DBA handled it — the NZ Privacy Act 2020 does not recognise "we assumed someone else did it" as a mitigating factor
Governance policy exists on paper but has no enforcement mechanism — there is no automated check, no masking verification query, and no one who receives an alert when a refresh completes with PII still present

7 Now You Try

🏥 Prompt Lab — Write a Test Data Management Policy

Write a test data management policy for a NZ healthcare organisation’s test environments. Cover: which fields must be masked (patient names, NHI numbers, diagnoses, GP details), the masking approach, the data refresh schedule, and how you’d handle a case where developers say they need real data to reproduce a bug.

Interview Questions

What NZ hiring managers ask about test data management in environments.

Q1. What are the risks of using production data in test environments and how do you mitigate them?

Strong answer: Production data contains PII protected under the NZ Privacy Act 2020. Test environments typically have weaker access controls than production — developers, contractors, and testers may access data without a business need. Mitigation: data masking (replace real names, Revenue NZ numbers, dates of birth with synthetic values preserving format), data subsetting (extract only records needed for the scenario), or synthetic data generation. For healthcare environments, the Health Information Security Framework prohibits real patient data in test environments without explicit approval.

Q2. How do you reset test data between runs in a shared environment?

Strong answer: Three approaches: database snapshot restore (fast complete reset, requires downtime), transactional rollback (wrap each test in a transaction, roll back after — works for unit/integration tests, not cross-service tests), or deterministic setup/teardown (each test creates its own data tagged with a run ID and cleans it up in teardown). In shared environments where multiple teams test simultaneously, deterministic setup/teardown is the only approach that does not break other teams' tests. Shared mutable test data is the most common cause of intermittent test failures.

Q3. How do you test a feature requiring specific edge-case data that does not exist in any environment?

Strong answer: Create it programmatically: use API calls or direct database inserts to create the specific state the test requires. Document the data creation as part of test preconditions. For hard-to-reproduce scenarios (an account with 10 years of transaction history, a KiwiSaver account at the contribution cap), write synthetic data generation scripts that can create any state consistently. Never rely on manually created test data that might not exist tomorrow — test data that cannot be recreated is a test that cannot be run reliably.

8 Self-Check

Click each question to reveal the answer.

Q1: At what point does the NZ Privacy Act 2020 apply to test data?

The moment production data is copied to a non-production environment, regardless of whether it has been masked or whether the intent is testing. Copying personal information creates a privacy obligation. The Act does not have an exemption for test environments.

Q2: Why is masking only the email address insufficient?

Re-identification risk. A combination of fields — name, address, Revenue NZ number, date of birth — can uniquely identify a real person even with the email removed. All PII fields must be masked to eliminate re-identification risk, not just the most obvious one.

Q3: What is the correct response when a developer requests real production data to reproduce a bug?

No. Reproduce the bug using a synthetic record with the same structural characteristics as the production case. Real PII is never the only way to reproduce a defect. Document the structural characteristics (e.g., account created before 2023, Revenue NZ number with specific checksum pattern) and create a synthetic record that matches.

Q4: Who is responsible for the test data strategy?

QA. Not the development team, not the database administrator. QA defines what data is needed for testing, specifies the masking requirements, and enforces the policy. If a developer refreshes without masking, QA owns the process that should have caught it.

Q5: What five elements must a test data governance policy cover?

Data classification (which fields are PII), refresh schedule (when clones are refreshed and masked), access controls (who can access non-production databases), data retention (when non-production data is purged), and audit trail (who accessed what data and when).

Key takeaway

Your masking strategy is only as strong as its weakest refresh — treat every environment update as a potential privacy breach waiting to happen, and build the verification that proves it didn't.

9 ISTQB Mapping

CTAL-TM v3.0 — Section 3.2.4: Test Data Management

The ISTQB Test Manager syllabus addresses test data management as part of environment planning, covering data creation, data storage, data security, and the privacy obligations that arise when personal information is used in testing. This lesson applies those concepts specifically to the NZ Privacy Act 2020 and the notifiable breach framework that the Privacy Commissioner administers.

10 Next Steps

You have completed the Environment & Release Management track. Review Release Gates for the full picture of how environment quality, test data governance, and Go/No-Go decisions work together — then consider the ERP Data Migration track for the specific challenge of migrating data between legacy and modern systems.

← Lesson 2: Release Gates Environment & Release Track

Test Data in Environments

1 The Hook

2 The Rule

3 The Analogy

4 Watch Me Do It

5 When to Use It

6 Common Mistakes

7 Now You Try

Interview Questions

8 Self-Check

Related techniques

9 ISTQB Mapping

10 Next Steps