Data & Referential Integrity
The worst data defects never throw an error. A row points at a parent that no longer exists, a duplicate slips past, and everything keeps running — wrong. This lesson teaches you to find the damage no message announces.
1 The Hook
A fictional NZ insurer, Tasman Mutual, ran a routine job to tidy duplicate customer records. Two records for the same person — created when she was entered once as a married name and once as a maiden name — were merged. The job deleted the older duplicate customer row and kept the newer one. The screen showed one clean customer. Everyone moved on.
But that deleted customer row had three policies linked to it. The policies were stored in a separate policy table, each carrying the customer_id of the record that was now gone. The merge job deleted the customer but never re-pointed the policies at the surviving record. Nothing failed. No error appeared. The three policies simply now referenced a customer that no longer existed.
For months everything looked normal. Premiums kept being collected by a separate billing process that worked off policy IDs. Then a claim came in against one of those policies. The claims system tried to load the policyholder’s details, found a customer_id pointing at nothing, and could not identify who the policy belonged to. A valid, paid-up customer could not make a claim, because in the data she no longer owned her own policies.
Here is the lesson hidden in that story. The merge looked successful at every visible level — the screen, the premium collection, the duplicate count. The damage was a broken link between two tables, a row pointing at a parent that no longer existed. It threw no error because nobody had told the database to enforce the relationship, and nobody had tested for orphans afterwards. This whole class of defect — silent corruption of the links between data — is what this lesson is about.
2 The Rule
Referential integrity defects are silent: a child row pointing at a missing parent, a duplicate where one should exist, a broken link — none of them throw an error, and all of them keep the system running while it is quietly wrong. You cannot wait for a failure to surface them. You test integrity directly: query for orphans, for duplicates, for violated constraints, and assert that the answer is zero rows.
3 The Analogy
A library card pointing at a book that has been thrown out.
A library catalogue card says “Book #4471, shelf 12”. Someone removes the book but never updates the card. The catalogue still looks complete — every card is neatly filed, the drawer is full, nothing looks wrong. The system only breaks when a borrower actually walks to shelf 12 and finds an empty gap. The card is an orphan: it references something that no longer exists, and the failure waits silently until someone follows the reference.
Tasman Mutual’s policies were those cards. They pointed at a customer who had been thrown out, and the catalogue looked fine until a claim — a borrower walking to the shelf — followed the reference and found nothing. An integrity tester does not wait for the borrower. They walk every shelf in advance and check that every card points at a book that is really there.
4 Constraints — the Database’s Own Rules
A constraint is a rule the database itself enforces on every write, refusing any change that would break it. Constraints are the system’s first line of defence against bad data — and when they are missing, the door to silent corruption is open. A tester checks both that the right constraints exist and that they actually bite.
The constraints that matter most:
- NOT NULL: the column must always have a value. An IRD number column marked
NOT NULLmeans no account can be saved without one. - UNIQUE: no two rows may share this value. A unique constraint on email, or on IRD number, is what stops the duplicate customer that started the Tasman story.
- PRIMARY KEY: the unique, never-null identifier of a row —
UNIQUEandNOT NULLcombined. - FOREIGN KEY: a column whose value must match an existing row in another table — the constraint that would have stopped the Tasman orphans by refusing to delete a customer who still had policies. This is the heart of referential integrity, covered next.
- CHECK: a value must satisfy a rule —
balance >= 0, orbenefit_type IN (...). It stops nonsensical values at the door.
The tester’s job is twofold. First, confirm the expected constraints are actually defined — a UNIQUE rule that everyone assumed existed but was never created will not stop a single duplicate. Second, confirm each constraint rejects a bad write: try to insert a duplicate email, a NULL where one is forbidden, a foreign key pointing nowhere — and assert the database refuses it. A constraint that is documented but not enforced is worse than none, because the team trusts it.
5 Foreign Keys and Referential Integrity
Referential integrity is the guarantee that every reference between tables points at something real. A policy row carries a customer_id; referential integrity is the promise that this customer_id always matches an actual customer. The foreign key constraint is what enforces that promise.
When a foreign key is properly defined, the database physically prevents the Tasman defect. It will not let you delete a customer who still has policy rows pointing at them — the delete is refused, forcing the merge job to deal with the policies first. The constraint turns a silent corruption into a loud, immediate error at the moment of the bad write, which is exactly where you want the failure.
The trouble is that referential integrity is frequently not enforced. Foreign keys cost a little performance and a little flexibility, so on many real NZ systems they were never added, or were disabled during a bulk load and never re-enabled. When the constraint is absent, nothing stops a child row from being orphaned — and the only protection left is a tester who goes looking. So the first integrity question on any new system is: are the foreign keys actually enforced, or just implied by naming? If they are not enforced, every relationship is a place orphans can breed, and every one needs a test.
customer_id does not create a foreign key — it just looks like one. Inspect the actual constraint definitions, not the column names. A “foreign key” that is only a naming convention enforces nothing.6 Orphaned Rows and Cascade Deletes
An orphaned row is a child row whose parent no longer exists — a policy whose customer was deleted, a payment whose order was removed, a claim line whose claim is gone. The defining query for an orphan is the LEFT JOIN from Lesson 1: keep every child, join to the parent, and find the children where the parent side is NULL.
FROM policy p
LEFT JOIN customer c ON c.customer_id = p.customer_id
WHERE c.customer_id IS NULL;
Read it: “Every policy whose customer cannot be found.” The expected answer is zero rows. Every row that comes back is a Tasman-style orphan — a policy nobody owns. This one query is the entire test the insurer was missing.
The opposite risk is a cascade delete that deletes too much. A foreign key can be set to ON DELETE CASCADE, meaning when you delete a parent, the database automatically deletes all its children. This prevents orphans — but it is a loaded weapon. Delete one customer and every policy, every payment, every claim linked to them vanishes in a single statement, with no warning. On a system that holds records you are legally required to retain, a cascade delete can destroy data you were obliged to keep.
So a tester checks both ends. Where there is no cascade and no enforced foreign key, test for orphans after every delete or merge. Where there is a cascade, test that it deletes exactly what it should and nothing more — delete a parent and assert that only the intended children went, that no unrelated record was swept up, and that nothing the organisation must retain was destroyed. Both an orphan and an over-eager cascade are silent until someone notices the damage.
7 The Silent-Corruption Defect Class
What ties this lesson together is a single, dangerous property: these defects do not announce themselves. A crash is easy — it stops the line, someone investigates, it gets fixed. Silent corruption is the opposite. The system keeps running, screens keep rendering, batches keep completing, and the data is quietly wrong the whole time. The cost compounds, because every day the corruption sits there it spreads into reports, backups, and downstream systems that copy from it.
The members of this defect class share a shape:
- Orphaned rows — a child pointing at a parent that no longer exists (the Tasman policies).
- Duplicates — two rows where the business rule says there must be one: two customer records for one person, two payments for one order.
- Broken or inconsistent references — a status that does not match its allowed set, a foreign key that points at the wrong parent.
- Lost updates — a write that was supposed to update three tables and updated two, leaving the data internally disagreeing with itself.
Because none of these throws an error, you cannot test for them by exercising the application and watching for failures. You test for them by querying the data and asserting an invariant — a rule that must always hold. “No policy has a missing customer.” “No person has two active customer records.” “Every payment’s order exists.” Each invariant becomes a query whose correct answer is zero rows. Running that set of invariant queries after any operation that moves or deletes data — a merge, a migration, a bulk load, a cleanup job — is how you catch silent corruption while it is still one row, not one million.
8 Building Integrity Test Cases
An integrity test case names the invariant it protects, gives the query that detects a violation, and states that the passing result is zero rows. It is run as a guard around the risky operation — before and after.
Here is an integrity test case written to catch the exact Tasman defect:
Invariant: Every policy references an existing customer (no orphans).
Risk category: Referential integrity — orphaned child rows.
Operation guarded: Customer de-duplication / merge job.
Detection query: SELECT p.policy_id, p.customer_id FROM policy p
LEFT JOIN customer c ON c.customer_id = p.customer_id
WHERE c.customer_id IS NULL;
Pass criterion: Query returns 0 rows, both before and after the merge job.
Evidence required: Row count before and after; the merge job run ID; list of any
orphaned policy_ids if non-zero.
Traceability: Risk R-11 (merge orphans linked records, policyholder unidentifiable).
Result: [Pass / Fail] — orphaned policy_ids listed if Fail.
Notice the shape: the invariant is stated in plain language, the detection query makes “is the invariant violated?” answerable, and the pass criterion is the unambiguous “zero rows”. Running it both before and after the merge matters — a clean “after” only proves the job did no harm if the “before” was also clean. And it traces to a numbered risk, so a reviewer can see why this invariant earns a guard.
9 Common Mistakes
🚫 Assuming a constraint exists because the column is named like one
Why it happens: A column called customer_id looks like an enforced foreign key.
The fix: A name enforces nothing. Inspect the actual constraint definitions. On many NZ systems foreign keys were never created or were disabled during a load — so the relationship the team trusts is not actually protected, and orphans can form freely.
🚫 Waiting for an error to reveal corrupted data
Why it happens: Functional testing finds bugs by triggering failures, so testers wait for one.
The fix: Integrity defects throw no error — the Tasman orphans sat silent for months. Test by asserting invariants directly with queries (“no orphans”, “no duplicates”) where the correct answer is zero rows. The absence of an error is not evidence of correct data.
🚫 Checking for orphans only after the operation, never before
Why it happens: You care about the result, so you check the end state.
The fix: A clean “after” only proves the operation added no orphans if the “before” was also clean. Capture the invariant before and after, so you can attribute any new corruption to the operation rather than inheriting old damage and missing your job’s impact.
🚫 Trusting a cascade delete without testing its blast radius
Why it happens: ON DELETE CASCADE prevents orphans, so it feels safe.
The fix: A cascade is a loaded weapon — deleting one parent can wipe every child in one statement, including records you are legally required to retain. Test that a cascade deletes exactly the intended children and nothing more, and that nothing the organisation must keep is destroyed.
10 Now You Try
Three graded exercises: spot the integrity risks, fix the migration that breaks links, then build an orphan-hunting query. Write your answer, run it for AI feedback, then compare to the model answer.
Read the schema and operation for a fictional ACC claims database below. Identify 3 referential-integrity risks that could silently corrupt the data, and name the defect type each one is (orphan, duplicate, missing constraint, cascade blast radius).
claimant(claimant_id PK, ird_number, full_name) — no UNIQUE on ird_number.claim(claim_id PK, claimant_id, status) — claimant_id is named like a key but has no foreign-key constraint defined.claim_payment(payment_id PK, claim_id, amount) — foreign key to claim with ON DELETE CASCADE.Operation: an admin tool lets staff delete a claim that was “entered by mistake”. There is no check for existing payments before deletion, and claimant records are sometimes created twice when the same person is entered by two branches.
List 3 integrity risks and the defect type of each:
Show model answer
There are at least four real risks; any three well-explained earns full marks. 1. Duplicate claimants — no UNIQUE constraint on ird_number, and the same person is entered by two branches, so one person can have two claimant_id records. Defect type: duplicate / missing UNIQUE constraint. Silent because both records are individually valid; nothing rejects the second one, and claims may scatter across both identities. 2. Orphaned claims — claimant_id on claim is named like a key but has no foreign-key constraint, so a claim can point at a claimant_id that does not exist (or is later deleted). Defect type: orphan / missing foreign key. Silent because the database never refuses the bad reference; the claim looks fine until someone tries to load the claimant. 3. Cascade blast radius — claim_payment cascades on delete from claim, and the admin tool deletes a claim with no check for existing payments. Deleting a "mistake" claim silently destroys every payment record linked to it. Defect type: cascade blast radius. Silent because the delete succeeds with no warning and the payment history — which ACC may be required to retain — is gone. Bonus: a deleted claim with an enforced FK would be refused while payments exist; here nothing stops it, compounding the cascade risk. The thread: every one of these throws no error. They are caught only by querying for orphans, duplicates, and by testing the delete's blast radius before trusting the tool.
A fictional Te Whatu Ora data-cleanup migration is meant to remove duplicate patient records, keeping the lowest patient_id for each NHI number and re-pointing all appointment rows to the kept record. The migration below has a bug that orphans appointments. Find the bug and rewrite the migration so no appointment is orphaned.
DELETE FROM patient
WHERE patient_id NOT IN (
SELECT MIN(patient_id) FROM patient GROUP BY nhi_number
);
-- Step 2: re-point appointments to the kept patient
UPDATE appointment SET patient_id = (
SELECT MIN(patient_id) FROM patient GROUP BY nhi_number
);
Identify the bug(s) and write the corrected migration order/logic:
Show model answer
There are two serious bugs, and they compound. Bug 1 — Wrong order. Step 1 deletes the duplicate patients BEFORE step 2 re-points the appointments. So at the moment of the delete, every appointment still points at a patient_id that is about to vanish. Any appointment whose patient is deleted is now an orphan, and step 2 cannot reliably fix what step 1 already broke. The re-point MUST happen before the delete. Bug 2 — The UPDATE is nonsensical. SELECT MIN(patient_id) FROM patient GROUP BY nhi_number returns one row per NHI, but the UPDATE has no correlation to the appointment's own NHI — it sets EVERY appointment to the same single value (or errors as a multi-row subquery). It does not map each appointment to ITS patient's kept record. Corrected migration — re-point first, mapping each appointment to the kept patient for its OWN NHI, then delete: -- Step 1: re-point each appointment to the surviving patient for the same NHI UPDATE appointment a SET patient_id = ( SELECT MIN(p2.patient_id) FROM patient p1 JOIN patient p2 ON p2.nhi_number = p1.nhi_number WHERE p1.patient_id = a.patient_id ); -- Step 2: now it is safe to delete the duplicates DELETE FROM patient WHERE patient_id NOT IN ( SELECT MIN(patient_id) FROM patient GROUP BY nhi_number ); -- Step 3 (the test): assert zero orphans SELECT a.appointment_id FROM appointment a LEFT JOIN patient p ON p.patient_id = a.patient_id WHERE p.patient_id IS NULL; -- must return 0 rows The lesson: re-point children before deleting parents, map each child to its own parent, and finish with an orphan-detection query as the pass gate.
On a fictional Kiwibank database with customer(customer_id PK, ird_number), account(account_id PK, customer_id), and transaction(txn_id PK, account_id, amount): write three queries that together prove integrity — (a) no orphaned accounts, (b) no orphaned transactions, (c) no duplicate customers by IRD number. For each, state the result that means “pass”.
Show model answer
(a) Orphaned accounts — accounts whose customer is missing: SELECT a.account_id, a.customer_id FROM account a LEFT JOIN customer c ON c.customer_id = a.customer_id WHERE c.customer_id IS NULL; PASS result: 0 rows. (b) Orphaned transactions — transactions whose account is missing: SELECT t.txn_id, t.account_id FROM transaction t LEFT JOIN account a ON a.account_id = t.account_id WHERE a.account_id IS NULL; PASS result: 0 rows. (c) Duplicate customers by IRD number: SELECT ird_number, COUNT(*) AS n FROM customer WHERE ird_number IS NOT NULL GROUP BY ird_number HAVING COUNT(*) > 1; PASS result: 0 rows. Why each works: (a) and (b) use a LEFT JOIN and filter for the parent key being NULL — an inner join would hide the very orphans you want. (c) groups by the value that must be unique and keeps only groups with more than one row; any row returned is a duplicate IRD. The WHERE ird_number IS NOT NULL avoids grouping all the missing IRDs together as a false duplicate. Run all three as a set after any merge, load, or cleanup. Together they assert the three core invariants: every account has a customer, every transaction has an account, and no IRD is shared. Each correct answer is zero rows.
11 Self-Check
Click each question to reveal the answer.
Q1: What is an orphaned row, and what query detects one?
A child row whose parent no longer exists — a policy whose customer was deleted. You detect it with a LEFT JOIN from the child to the parent, filtering for the parent key being NULL: every row returned is an orphan, and the passing result is zero rows.
Q2: Why is the silent-corruption defect class so dangerous?
Because it throws no error. The system keeps running while the data is wrong, so functional testing that waits for a failure never catches it — the Tasman orphans sat silent for months. The cost compounds as the corruption spreads into reports and backups. You catch it only by asserting invariants directly with queries whose correct answer is zero rows.
Q3: Why is a column named customer_id not proof of an enforced foreign key?
A name is just a naming convention — it enforces nothing. A real foreign key is a defined constraint the database checks on every write. On many NZ systems the constraint was never created or was disabled during a bulk load, so the relationship the team trusts is unprotected. You must inspect the actual constraint definitions, not the column names.
Q4: Why test a cascade delete, when it prevents orphans?
Because it is a loaded weapon. ON DELETE CASCADE stops orphans but deletes every child of a parent in one statement, with no warning — potentially wiping records the organisation is legally required to retain. Test that it deletes exactly the intended children and nothing more, and that nothing that must be kept is destroyed.
Q5: Why capture an integrity invariant both before and after an operation?
Because a clean “after” only proves the operation caused no corruption if the “before” was also clean. Without the baseline you might inherit pre-existing damage and miss your own job’s impact, or wrongly blame the operation for old orphans. The before/after pair attributes any new corruption to the operation under test.
12 Interview Prep
Real questions asked in NZ QA interviews for data-heavy roles. Read the model answers, then practise your own version.
“A merge job ran in production and now some policies have no owner. How would you have caught this in testing?”
This is an orphaned-row defect — policies pointing at a customer that the merge deleted. I would have guarded the merge with an integrity test: a LEFT JOIN from policy to customer, filtering for the customer key being NULL, run both before and after the job, with a pass criterion of zero rows. Before the merge it confirms a clean baseline; after, any returned policy_id is an orphan the job created. I would also have checked whether a foreign key was actually enforced — if it had been, the database would have refused to delete a customer who still had policies, turning this silent corruption into a loud error at the right moment. The root cause is that nobody asserted the ‘every policy has a customer’ invariant.
“What is referential integrity, and why can you not assume it is enforced?”
Referential integrity is the guarantee that every reference between tables points at something real — every customer_id on a policy matches an actual customer. It is enforced by foreign key constraints. You cannot assume it because foreign keys are often missing in practice: they cost a little performance, so on many NZ systems they were never added, or were disabled during a bulk load and never re-enabled. A column named like a key is not a key. So my first integrity question on any system is whether the foreign keys are actually defined and enforced — and if they are not, every relationship is a place orphans can form, and every one needs a test.
“How do you test for defects that never throw an error?”
By asserting invariants instead of waiting for failures. An invariant is a rule that must always hold — no orphans, no duplicate IRD numbers, every payment’s order exists. I turn each invariant into a query whose correct answer is zero rows, and I run that set after any operation that moves or deletes data: a merge, a migration, a bulk load, a cleanup job. If a query returns rows, that is my failing test, even though the application itself reported success. Silent corruption is invisible to functional testing precisely because nothing errors, so the only reliable way to find it is to interrogate the data directly and prove the rules still hold.