Environment & Data Strategy
Protecting PII and provisioning the battlefield. Learn how to govern Test Environments and Test Data Management (TDM).
1 The Hook
A Test Lead logged a Sev 1 bug: "The checkout is crashing for all Premium users!" The development team spent 8 hours investigating, only to realize the code was fine. The issue? The "Premium_User_12" test account had expired in the SIT database.
A Test Lead tries to find valid test data. A Test Manager builds an automated factory to generate it. Without strict governance over Environments (TEM) and Data (TDM), your team will spend 50% of their time fighting infrastructure instead of testing software.
2 The Rule
Test Data is a Liability; Test Environments are a Bottleneck. You must mandate Synthetic Data over Production Data, and enforce strict "Booking" rules for shared environments.
3 Watch Me Do It: Structuring the Sandbox
Observe how an Enterprise Test Manager architects the flow of code and data.
1. Environment Segregation (TEM)
You cannot have 5 squads testing in one "QA" server. The Manager mandates:
• SIT (System Integration): Daily deployments. Chaotic. Used for API and automated regression.
• UAT (User Acceptance): Weekly locked deployments. Stable. Used by Business Stakeholders.
• Pre-Prod (Staging): Identical hardware to Production. Used for Performance testing only.
2. Data Masking vs Synthetic Data (TDM)
Under the NZ Privacy Act 2020, you cannot use real customer names or credit cards in SIT. The Manager enforces a policy: "Production databases cloned to SIT must run through an obfuscation script to scramble PII, or we must use 100% synthetic (fake) injected data."
3. Test User Accounts & State
Automation suites fail because MFA (Multi-Factor Auth) blocks the login, or the test user's password expires. The Manager negotiates with InfoSec for "Static MFA Tokens" for automation accounts and creates an API script to reset test users to a "Clean State" every night at 3 AM.
4 Data Lab: The Security Breach
In this lab, you must govern a data decision against a Lead Developer trying to cut corners.
Your Task: The Raw Dump Request
A Sev 1 bug in production only happens to users with very specific, complex billing histories. The Dev Lead says: "I cannot reproduce this locally. I am going to dump the production database into the UAT environment so we can debug it."
CONSIDER THE RISKS:
- Compliance: UAT does not have the same firewall protections as Production. Moving raw PII (Personally Identifiable Information) violates data laws.
- Contamination: If UAT sends out automated emails, real customers might receive "Test" emails.
- The Alternative: Can the Dev debug using anonymized logs, or can you synthesize a user with the exact same billing state?
How do you reply? Draft your response in your notes. (Hint: "Request Denied. UAT is not cleared for raw PII. Provide the specific state parameters, and my team will generate a synthetic user with that exact billing history in UAT within 2 hours.")
5 Test Data Masking & Synthetic Data Generation
Under NZ Privacy Act 2020, moving real customer data into test environments is risky. Use masking or synthetic data instead.
Data Masking Techniques
When you must copy production data to SIT, apply one of these techniques:
| Technique | How It Works | Risk Level | Tools |
|---|---|---|---|
| Redaction | Remove sensitive columns entirely. Delete SSN, credit cards, passwords. | Low | SQL scripts, AWS Glue |
| Scrambling | Shuffle values. "John Doe" → "Jane Smith". Preserves DB relationships. | Low | Oracle Data Masking, Informatica |
| Substitution | Replace with fake realistic values. "john@real.com" → "tester001@test.com" | Low | Custom scripts, Faker library |
| Tokenization | Replace PII with non-sensitive token. "4532-1111-2222-3333" → "TOKEN_ABC123" | Very Low | AWS Payment Cryptography, Vault |
| Encryption | Encrypt PII in place (if you have decryption key in test). | Medium | Database TDE, AWS KMS |
Synthetic Data Generation Approaches
Better than masking: generate fake data that looks real but contains no actual customer info.
| Approach | Use Case | Complexity |
|---|---|---|
| Faker Library (Python, Ruby, JS) | Quick data generation. Addresses, names, emails. Great for dev/early SIT. | Easy |
| Data Factories in Code (Test fixtures) | Automation. Generate specific states: "User with 3 active loans" or "Order with split payments." | Medium |
| Production-Like Cloning + Masking (Hybrid) | Performance testing. You need real volume, real distribution, but without PII. | Hard |
| AI-Generated Synthetic Data (e.g. Gretel.ai, Mostly AI) | Complex tables with realistic relationships. E.g. customers + transactions + payments that match real distributions. | Very Hard |
Example: Masking Script (SQL)
-- Mask customer data before copying to SIT
UPDATE customers
SET
first_name = CONCAT('User', customer_id),
last_name = CONCAT('Test', customer_id),
email = CONCAT('test', customer_id, '@test.local'),
phone = '555-0100',
ssn = NULL -- Delete entirely
WHERE env = 'SIT';
-- Mask payment data
UPDATE payments
SET
card_number = 'XXXX-XXXX-XXXX-0000',
cvv = '000'
WHERE env = 'SIT';
6 Data Retention Policy & NZ Privacy Act 2020 Compliance
Test data lives longer than you think. Old test databases become audit nightmares and privacy violations. Mandate data retention and cleanup policies.
Data Retention Policy Template
DATA RETENTION SCHEDULE
| Data Type | Environment | Retention Period | Cleanup Method |
|---|---|---|---|
| Test Execution Records (logs, screenshots) | SIT, UAT | 90 days (then archive to S3) | Automated deletion script |
| Test Databases (with masked data) | SIT | Rebuild nightly (24hr TTL) | Automated DB refresh |
| Production Data Snapshots (for comparison testing) | SIT | 7 days (then delete) | Manual approval + deletion |
| Synthetic Test Data (no PII risk) | Dev, Test | Keep indefinitely | N/A |
| Audit Trail / Evidence (for compliance) | All | 7 years (legal hold) | Archive to compliance vault |
NZ Privacy Act 2020 Compliance for Test Data
NZ law requires you to protect PII in test environments as strictly as in production. Violate these principles and you face fines up to NZ$5,000 plus reputational damage.
- Collection: Only collect test data you need. If you're testing payments, you don't need customer names.
- Storage: Encrypt PII in test databases. If the hard drive is stolen, data is useless.
- Access Control: Limit who can see unmasked data. Only the data engineer, not every tester.
- Retention: Delete test data after testing is complete. Don't keep it "just in case."
- Disclosure: Never move production data to test without masking. If you do, you must notify affected customers (data breach notification).
- Audit Trail: Log who accessed what test data and when. Prove compliance.
NZ Case Study: A healthcare startup tested a patient portal using real patient medical records. A tester left the database publicly accessible on AWS. The incident resulted in 3,000+ patient records exposed. The company was fined and shut down. Use synthetic data. Always.
7 Data Classification Framework
Not all data is equally sensitive. Classify your data and apply appropriate controls.
Classification Levels
| Level | Examples | Encryption Required? | Masking in Test? | Data Retention |
|---|---|---|---|---|
| Public | Blog posts, marketing copy, app version number | No | No | Indefinite |
| Internal | Employee names, org structure, internal docs | No* | Possible | 1 year |
| Confidential | Customer names, emails, phone numbers, billing addresses | Yes | MASK (required) | 90 days |
| Restricted | Credit cards, SSN, medical records, passwords | Yes | REMOVE (never test with real data) | 7 days |
| Highly Restricted | Auth tokens, API keys, encryption keys | Yes | NEVER in test (tokenize only) | Delete immediately |
Data Classification Decision Tree
Use this to classify any piece of data:
- Is it public knowledge? → PUBLIC
- Would exposing it embarrass the company or user? → INTERNAL or CONFIDENTIAL
- Could it be used to impersonate someone or steal money? → RESTRICTED
- Is it a credential or key that unlocks other data? → HIGHLY RESTRICTED
Test Data Governance Policy
All test data must be:
- Synthetic or masked (no real PII)
- Documented in the test plan (what data is used and why)
- Encrypted if stored on disk
- Deleted after testing (or archived under retention policy)
- Audited: log who accessed it, when, and what they did
8 Common Mistakes
⚠ Manual Test Data Creation
Why it fails: If a tester has to spend 20 minutes manually clicking through the UI to create a "New Customer with a Home Loan" just to test a single API, your velocity dies. A Manager must build automated Data Seeding APIs to generate test states instantly.
⚠ Missing Environment Schedules
Why it fails: Squad A runs a load test while Squad B is doing exploratory testing. Squad B reports 50 timeout bugs. The Manager must own the "Environment Booking Calendar" to prevent test contamination.
⚠ Using Real Credit Cards or SSNs in Test
Why it fails (NZ-specific): Violates Privacy Act 2020. You could face fines and legal liability. Use fake cards (e.g., test card numbers) and synthetic SSNs only.
9 Self-Check
Q1. Why should UAT and SIT environments be kept strictly separate?
SIT changes daily with broken code; it is volatile. UAT must be stable so Business Stakeholders can sign off. If you mix them, the Business will lose faith in the software because of backend integration issues.
Q2. What is "Data Masking"?
The process of obfuscating real production data (e.g., changing 'John Doe' to 'Xkfj Rpl') when copying it to lower environments. It preserves the database relationships but destroys the privacy risk.
Q3. What's better: masking or synthetic data?
Synthetic data is better. It has zero privacy risk. Use masking only when you need real data distributions (e.g., performance testing). Always prefer synthetic for functional testing.
Q4. What does NZ Privacy Act 2020 require for test data?
Protect PII in test environments as strictly as in production. Encrypt it, limit access, mask it, delete it after testing, and audit who accessed it. Failure to comply can result in fines and legal liability.