Environment & Data Strategy — Test Manager Learning

1 The Hook

A Test Lead logged a Sev 1 bug: "The checkout is crashing for all Premium users!" The development team spent 8 hours investigating, only to realize the code was fine. The issue? The "Premium_User_12" test account had expired in the SIT database.

A Test Lead tries to find valid test data. A Test Manager builds an automated factory to generate it. Without strict governance over Environments (TEM) and Data (TDM), your team will spend 50% of their time fighting infrastructure instead of testing software.

2 The Rule

Test Data is a Liability; Test Environments are a Bottleneck. You must mandate Synthetic Data over Production Data, and enforce strict "Booking" rules for shared environments.

3 Watch Me Do It: Structuring the Sandbox

Observe how an Enterprise Test Manager architects the flow of code and data.

1. Environment Segregation (TEM)

You cannot have 5 squads testing in one "QA" server. The Manager mandates:
• SIT (System Integration): Daily deployments. Chaotic. Used for API and automated regression.
• UAT (User Acceptance): Weekly locked deployments. Stable. Used by Business Stakeholders.
• Pre-Prod (Staging): Identical hardware to Production. Used for Performance testing only.

2. Data Masking vs Synthetic Data (TDM)

Under the NZ Privacy Act 2020, you cannot use real customer names or credit cards in SIT. The Manager enforces a policy: "Production databases cloned to SIT must run through an obfuscation script to scramble PII, or we must use 100% synthetic (fake) injected data."

3. Test User Accounts & State

Automation suites fail because MFA (Multi-Factor Auth) blocks the login, or the test user's password expires. The Manager negotiates with InfoSec for "Static MFA Tokens" for automation accounts and creates an API script to reset test users to a "Clean State" every night at 3 AM.

4 Data Lab: The Security Breach

In this lab, you must govern a data decision against a Lead Developer trying to cut corners.

Your Task: The Raw Dump Request

A Sev 1 bug in production only happens to users with very specific, complex billing histories. The Dev Lead says: "I cannot reproduce this locally. I am going to dump the production database into the UAT environment so we can debug it."

CONSIDER THE RISKS:

Compliance: UAT does not have the same firewall protections as Production. Moving raw PII (Personally Identifiable Information) violates data laws.
Contamination: If UAT sends out automated emails, real customers might receive "Test" emails.
The Alternative: Can the Dev debug using anonymized logs, or can you synthesize a user with the exact same billing state?

How do you reply? Draft your response in your notes. (Hint: "Request Denied. UAT is not cleared for raw PII. Provide the specific state parameters, and my team will generate a synthetic user with that exact billing history in UAT within 2 hours.")

5 Test Data Masking & Synthetic Data Generation

Under NZ Privacy Act 2020, moving real customer data into test environments is risky. Use masking or synthetic data instead.

Data Masking Techniques

When you must copy production data to SIT, apply one of these techniques:

Technique	How It Works	Risk Level	Tools
Redaction	Remove sensitive columns entirely. Delete SSN, credit cards, passwords.	Low	SQL scripts, AWS Glue
Scrambling	Shuffle values. "John Doe" → "Jane Smith". Preserves DB relationships.	Low	Oracle Data Masking, Informatica
Substitution	Replace with fake realistic values. "john@real.com" → "tester001@test.com"	Low	Custom scripts, Faker library
Tokenization	Replace PII with non-sensitive token. "4532-1111-2222-3333" → "TOKEN_ABC123"	Very Low	AWS Payment Cryptography, Vault
Encryption	Encrypt PII in place (if you have decryption key in test).	Medium	Database TDE, AWS KMS

Synthetic Data Generation Approaches

Better than masking: generate fake data that looks real but contains no actual customer info.

Approach	Use Case	Complexity
Faker Library (Python, Ruby, JS)	Quick data generation. Addresses, names, emails. Great for dev/early SIT.	Easy
Data Factories in Code (Test fixtures)	Automation. Generate specific states: "User with 3 active loans" or "Order with split payments."	Medium
Production-Like Cloning + Masking (Hybrid)	Performance testing. You need real volume, real distribution, but without PII.	Hard
AI-Generated Synthetic Data (e.g. Gretel.ai, Mostly AI)	Complex tables with realistic relationships. E.g. customers + transactions + payments that match real distributions.	Very Hard

Example: Masking Script (SQL)

-- Mask customer data before copying to SIT
UPDATE customers
SET
  first_name = CONCAT('User', customer_id),
  last_name = CONCAT('Test', customer_id),
  email = CONCAT('test', customer_id, '@test.local'),
  phone = '555-0100',
  ssn = NULL  -- Delete entirely
WHERE env = 'SIT';

-- Mask payment data
UPDATE payments
SET
  card_number = 'XXXX-XXXX-XXXX-0000',
  cvv = '000'
WHERE env = 'SIT';

6 Data Retention Policy & NZ Privacy Act 2020 Compliance

Test data lives longer than you think. Old test databases become audit nightmares and privacy violations. Mandate data retention and cleanup policies.

Data Retention Policy Template

DATA RETENTION SCHEDULE

Data Type	Environment	Retention Period	Cleanup Method
Test Execution Records (logs, screenshots)	SIT, UAT	90 days (then archive to S3)	Automated deletion script
Test Databases (with masked data)	SIT	Rebuild nightly (24hr TTL)	Automated DB refresh
Production Data Snapshots (for comparison testing)	SIT	7 days (then delete)	Manual approval + deletion
Synthetic Test Data (no PII risk)	Dev, Test	Keep indefinitely	N/A
Audit Trail / Evidence (for compliance)	All	7 years (legal hold)	Archive to compliance vault

NZ Privacy Act 2020 Compliance for Test Data

NZ law requires you to protect PII in test environments as strictly as in production. Violate these principles and you face fines up to NZ$5,000 plus reputational damage.

Collection: Only collect test data you need. If you're testing payments, you don't need customer names.
Storage: Encrypt PII in test databases. If the hard drive is stolen, data is useless.
Access Control: Limit who can see unmasked data. Only the data engineer, not every tester.
Retention: Delete test data after testing is complete. Don't keep it "just in case."
Disclosure: Never move production data to test without masking. If you do, you must notify affected customers (data breach notification).
Audit Trail: Log who accessed what test data and when. Prove compliance.

NZ Case Study: A healthcare startup tested a patient portal using real patient medical records. A tester left the database publicly accessible on AWS. The incident resulted in 3,000+ patient records exposed. The company was fined and shut down. Use synthetic data. Always.

7 Data Classification Framework

Not all data is equally sensitive. Classify your data and apply appropriate controls.

Classification Levels

Level	Examples	Encryption Required?	Masking in Test?	Data Retention
Public	Blog posts, marketing copy, app version number	No	No	Indefinite
Internal	Employee names, org structure, internal docs	No*	Possible	1 year
Confidential	Customer names, emails, phone numbers, billing addresses	Yes	MASK (required)	90 days
Restricted	Credit cards, SSN, medical records, passwords	Yes	REMOVE (never test with real data)	7 days
Highly Restricted	Auth tokens, API keys, encryption keys	Yes	NEVER in test (tokenize only)	Delete immediately

Data Classification Decision Tree

Use this to classify any piece of data:

Is it public knowledge? → PUBLIC
Would exposing it embarrass the company or user? → INTERNAL or CONFIDENTIAL
Could it be used to impersonate someone or steal money? → RESTRICTED
Is it a credential or key that unlocks other data? → HIGHLY RESTRICTED

Test Data Governance Policy

All test data must be:

Synthetic or masked (no real PII)
Documented in the test plan (what data is used and why)
Encrypted if stored on disk
Deleted after testing (or archived under retention policy)
Audited: log who accessed it, when, and what they did

8 Common Mistakes

⚠ Manual Test Data Creation

Why it fails: If a tester has to spend 20 minutes manually clicking through the UI to create a "New Customer with a Home Loan" just to test a single API, your velocity dies. A Manager must build automated Data Seeding APIs to generate test states instantly.

⚠ Missing Environment Schedules

Why it fails: Squad A runs a load test while Squad B is doing exploratory testing. Squad B reports 50 timeout bugs. The Manager must own the "Environment Booking Calendar" to prevent test contamination.

⚠ Using Real Credit Cards or SSNs in Test

Why it fails (NZ-specific): Violates Privacy Act 2020. You could face fines and legal liability. Use fake cards (e.g., test card numbers) and synthetic SSNs only.

9 Self-Check

Q1. Why should UAT and SIT environments be kept strictly separate?

SIT changes daily with broken code; it is volatile. UAT must be stable so Business Stakeholders can sign off. If you mix them, the Business will lose faith in the software because of backend integration issues.

Q2. What is "Data Masking"?

The process of obfuscating real production data (e.g., changing 'John Doe' to 'Xkfj Rpl') when copying it to lower environments. It preserves the database relationships but destroys the privacy risk.

Q3. What's better: masking or synthetic data?

Synthetic data is better. It has zero privacy risk. Use masking only when you need real data distributions (e.g., performance testing). Always prefer synthetic for functional testing.

Q4. What does NZ Privacy Act 2020 require for test data?

Protect PII in test environments as strictly as in production. Encrypt it, limit access, mask it, delete it after testing, and audit who accessed it. Failure to comply can result in fines and legal liability.