Database Testing · Lesson 1

SQL for Testers

The screen says “saved”. SQL is how you find out whether that is true. This lesson teaches you to query the database directly and assert on what actually persisted.

Database Testing Database & Backend — Lesson 1 of 4 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ council, Awanui District Council, shipped a feature that let residents update their postal address online for rates notices. The tester clicked through it: typed a new address, pressed Save, saw the green “Your details have been updated” banner, and reloaded the page. The new address showed. They marked it passed.

Three weeks later the quarterly rates notices went out. A few hundred of them went to the old addresses. People who had carefully updated their details still got their bill at the house they had moved out of, and some missed the payment deadline because the notice never reached them.

The screen had not lied, exactly. The address update wrote to a customer_profile table that the web page read back from. But the rates-notice batch read from a separate billing_address table, and the update never wrote there. The web page showed the new address because it was reading the same row it had just written. The reload “proved” nothing — it just re-read the one place the data had landed. Nobody had checked the table the bills were actually generated from.

Here is the lesson hidden in that story. The tester confirmed the screen, and the screen confirmed itself. The only way to catch this defect was to step behind the UI and ask the database directly: after I save an address, which tables changed, and did every table the business depends on get the new value? That question is a SELECT. This lesson teaches you to write it.

2 The Rule

The screen renders what one query returned; it does not prove what the database stored. To verify that an action persisted correctly, query the data layer directly and assert on the actual rows — every table the action should have touched, not just the one the UI reads back. A test that only re-reads the screen is testing the screen, not the system.

3 The Analogy

Analogy

Changing your address with one government agency and assuming the rest know.

When you move house in NZ, you tell IRD your new address. The IRD website confirms it, shows it back to you, and everything looks done. But Waka Kotahi still has your old address for your licence, the electoral roll still has the old one, and your KiwiSaver provider never heard a thing. One confirmation from one agency told you nothing about the others. You only find out when a renewal notice goes to the wrong house.

A database is the same. The UI confirming a save is one agency confirming. SQL is how you ring each of the other agencies and ask directly: do you have the new address? A tester who reads the database checks every table the change should have reached, instead of trusting one screen to speak for all of them.

4 SELECT, WHERE and ORDER BY — Reading State

A SELECT reads rows from a table. It changes nothing — it is the safe, read-only verb a tester lives in. The shape is always the same: which columns, from which table, matching which condition.

SELECT customer_id, postal_address, updated_at
FROM   billing_address
WHERE  customer_id = 48213
ORDER BY updated_at DESC;

Read it as a sentence: “Give me the customer ID, postal address, and last-updated time, from the billing address table, for customer 48213, newest first.” That single query is the test the Awanui tester never ran. If billing_address still shows the old address after a save, the defect is caught before a single rates notice goes out.

The WHERE clause is where a tester does most of the work. It narrows millions of rows to the one case under test. A few patterns you will use constantly:

  • Exact match: WHERE status = 'ACTIVE' — the row must equal that value.
  • Ranges: WHERE created_at >= '2026-04-01' — everything on or after that date.
  • Sets: WHERE benefit_type IN ('JOBSEEKER','SUPPORTED_LIVING') — any of these.
  • Negation and combination: WHERE region = 'AUCKLAND' AND status != 'CLOSED'.

ORDER BY ... DESC matters more than it looks. When you save a record twice, or a process writes a history row each time, the newest row is the one that reflects the action you just performed. Sorting newest-first puts the row you care about at the top, so you assert on the right one rather than an old version.

5 JOIN — Following the Links

Real data is spread across tables that point at each other. A customer lives in one table, their KiwiSaver accounts in another, their contributions in a third. A JOIN stitches them back together along the keys that link them, so you can assert across the whole picture in one query.

SELECT c.customer_id, c.full_name, a.account_number, a.balance
FROM   customer c
JOIN   kiwisaver_account a ON a.customer_id = c.customer_id
WHERE  c.customer_id = 48213;

The ON clause is the link: every kiwisaver_account row carries a customer_id that matches a customer. A plain JOIN (an inner join) returns only rows where both sides match — a customer with no account simply will not appear.

That last point is a trap and a tool. If you want to find customers who should have an account but do not, an inner join hides them — they vanish from the result. A LEFT JOIN keeps every row from the left table and fills the right side with NULL where there is no match, so the missing accounts show up as empty. Finding rows where the right side is NULL after a LEFT JOIN is one of the most useful tester moves there is — it surfaces the things that should be linked but are not. You will see this again in Lesson 2 when hunting orphaned rows.

Pro tip: When a save is supposed to write across several tables (the Awanui case), a single LEFT JOIN from the master record out to each dependent table tells you in one query which writes happened and which silently did not. The NULL columns are your missing writes.

6 GROUP BY and COUNT — Asserting on Totals

Some defects are invisible at the single-row level and obvious in aggregate. A duplicate payment looks fine row by row — each row is a valid payment. It is only when you count payments per order that two appear where one should. GROUP BY collapses many rows into one per group, and COUNT tells you how many fell into each.

SELECT order_id, COUNT(*) AS payment_count, SUM(amount) AS total_charged
FROM   payment
WHERE  created_at >= '2026-06-01'
GROUP BY order_id
HAVING COUNT(*) > 1;

Read it: “For each order since 1 June, how many payments and what total — but only show orders with more than one payment.” The HAVING clause filters after grouping (where WHERE filters before), so it can test the count itself. Any row that comes back is an order charged more than once — the double-charge defect, found with a single query instead of a customer complaint.

This is the assertion shape behind most reconciliation. “Every order has exactly one payment” becomes “no order has a count greater than one”. “No duplicate IRD numbers” becomes “group by ird_number, having count greater than one returns zero rows”. When the expected answer is “zero rows”, a query that returns any rows at all is a failing test.

7 Reading an Unfamiliar Schema

You will constantly be dropped into a database you have never seen, on a system someone else built. Before you can write a useful query you have to read the schema — the map of tables, columns, and the keys that link them.

Three things to find first, every time:

  • The primary key: the column that uniquely identifies a row in a table — usually an id or <thing>_id. This is the handle you grab a specific record by.
  • The foreign keys: columns that point at another table’s primary key — a customer_id on a claim table means each claim belongs to a customer. These are the lines you will JOIN along.
  • The columns that carry the thing under test: for an address change, which column actually holds the address, and is there more than one address table? For a status change, which column is the status, and what are its allowed values?

A schema diagram or the database’s own catalogue (the system tables that list every table and column) gives you this. So does asking a developer to walk you through “where does this action write?” — a five-minute conversation that would have saved Awanui District Council. The goal is a mental map: when the application does X, which tables and columns change, and how are they linked.

Pro tip: When you do not know which table a save writes to, do a before-and-after. Snapshot the relevant tables, perform the action once, then compare. The tables whose rows changed are the ones the action touched — and if a table you expected to change did not, you have found a defect before writing a single targeted query.

8 The NULL Traps

NULL means “no value recorded” — not zero, not an empty string, not false. It is the absence of an answer, and it behaves unlike any real value, which is exactly why it quietly breaks tester queries.

Trap one: NULL is never equal to anything, including NULL. WHERE middle_name = NULL returns nothing — not even rows where the middle name is genuinely missing. You must write WHERE middle_name IS NULL. The = operator cannot test for absence.

Trap two: a condition silently drops NULL rows. WHERE status != 'CLOSED' looks like it returns every non-closed record. It does not return rows where status is NULL, because NULL != 'CLOSED' is not true — it is unknown. A claim with no status set vanishes from your result, and you never test it. To include them you must write WHERE status != 'CLOSED' OR status IS NULL.

Trap three: COUNT and aggregates skip NULLs. COUNT(middle_name) counts only rows where the middle name is present; COUNT(*) counts all rows. If you expect every customer to have an email and COUNT(email) is lower than COUNT(*), you have just found the rows with a missing email — a real data quality defect.

For a tester, NULL is not an annoyance to route around — it is often the defect itself. An IRD number that should always be present but is NULL for some accounts, a mandatory consent flag left empty, a foreign key that points nowhere: these are exactly the rows you are paid to find. Make NULL-hunting a habit, not an afterthought.

9 Common Mistakes

🚫 Verifying a save by reloading the same screen

Why it happens: The reload re-renders and the new value appears, so it feels like proof.
The fix: The reload re-reads the one place the write landed — the Awanui trap. Query each table the action should have written to, directly, and confirm every one of them carries the new value. The screen cannot tell you about the table it does not read from.

🚫 Using = NULL to find missing values

Why it happens: Equality is how you match every other value, so it seems natural for NULL too.
The fix: = NULL returns nothing, ever. Use IS NULL and IS NOT NULL. NULL is absence, and absence cannot be compared with = — it can only be tested for existence.

🚫 Trusting an inner JOIN to show you missing links

Why it happens: An inner join is the default and returns clean matched rows.
The fix: An inner join hides the records with no match — the very ones you often want. To find a customer with no account, or an order with no payment, use a LEFT JOIN and look for NULL on the right side. The gaps are the defects.

🚫 Asserting on one row when the defect lives in the total

Why it happens: Each individual row is valid, so spot-checking looks fine.
The fix: Duplicates and miscounts are invisible row by row. Use GROUP BY ... HAVING COUNT(*) > 1 to assert “exactly one per group”. When the expected answer is zero rows, any returned row is a failing test.

10 Now You Try

Three graded exercises: spot the broken query, fix a verification, then build a persistence-proof query. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Bug in the Query

A tester wrote the query below to find every KiwiSaver account that is not closed and has no recorded IRD number, on a fictional ANZ KiwiSaver database. It returns far fewer rows than expected and misses real problem accounts. Identify every bug and explain why each one drops rows it should return.

SELECT account_id, customer_id, ird_number, status
FROM   kiwisaver_account
WHERE  status != 'CLOSED'
AND    ird_number = NULL;

List the bugs and the corrected query:

Show model answer
There are two real bugs, and the second is fatal on its own.

Bug 1 — ird_number = NULL never matches anything. = cannot test for absence; NULL is never equal to NULL. So the whole query returns zero rows about missing IRD numbers, which is the opposite of the goal. Fix: ird_number IS NULL.

Bug 2 — status != 'CLOSED' silently drops rows where status is NULL. NULL != 'CLOSED' is unknown, not true, so any account with no status set is excluded — and an account with a missing status is exactly the kind of broken record you want to see. Fix: (status != 'CLOSED' OR status IS NULL).

Corrected query:
SELECT account_id, customer_id, ird_number, status
FROM   kiwisaver_account
WHERE  (status != 'CLOSED' OR status IS NULL)
AND    ird_number IS NULL;

The lesson: both bugs are NULL traps. = NULL finds nothing, and a plain != filter quietly removes the NULL rows that are often the defects you are hunting.
🔧 Exercise 2 of 3 — Fix the Verification

On a fictional Waka Kotahi licence-renewal system, a renewal is supposed to write to three tables: licence (new expiry date), payment (the renewal fee), and notification_queue (the confirmation email). A tester “verified” a renewal like this: “I renewed the licence, the screen showed the new expiry date, and I reloaded the licence page — it still showed the new date. Pass.” Rewrite this as a proper persistence verification: describe the queries you would run against each table and the exact assertion for each.

Write the verification — one query and one assertion per table:

Show model answer
licence table — query: SELECT expiry_date, updated_at FROM licence WHERE licence_id = :id;
              assertion: expiry_date equals the new expected expiry (e.g. exactly +1 year from the old one), and updated_at is the time of the renewal.

payment table — query: SELECT amount, status, COUNT(*) FROM payment WHERE licence_id = :id AND created_at >= :renewal_time GROUP BY amount, status;
              assertion: exactly one payment row exists for this renewal, amount equals the renewal fee, status is settled/successful — not zero rows and not two.

notification_queue table — query: SELECT recipient, type, created_at FROM notification_queue WHERE licence_id = :id AND type = 'RENEWAL_CONFIRMATION';
              assertion: exactly one queued confirmation exists, addressed to the correct holder.

Why the original was insufficient: reloading the licence page only re-reads the licence table — the one place the write that the screen shows landed. It proves nothing about payment or notification_queue. The renewal fee could have failed to post and the email could be missing, and the verification would still say "Pass". A real persistence check asserts on every table the action should touch, not just the one the UI reads back. This is the Awanui trap applied to three tables.
🏗️ Exercise 3 of 3 — Build a Persistence-Proof Query

On a fictional MSD benefit-payment database with tables client(client_id, full_name), benefit(benefit_id, client_id, benefit_type, status), and payment(payment_id, benefit_id, amount, paid_on): write a single SQL query that proves whether every active benefit has been paid at least once this month, and would surface any active benefit that has not been paid. State the result that means “pass”.

Show model answer
SQL query:
SELECT b.benefit_id, b.client_id, b.benefit_type
FROM   benefit b
LEFT JOIN payment p
       ON p.benefit_id = b.benefit_id
       AND p.paid_on >= '2026-06-01'
WHERE  b.status = 'ACTIVE'
AND    p.payment_id IS NULL;

Expected result for a PASS: zero rows. Every row that comes back is an active benefit with no payment this month — an unpaid client, which is a defect.

Which JOIN type and why: a LEFT JOIN, because an inner join would hide exactly the benefits with no matching payment — the ones you are hunting. The LEFT JOIN keeps every active benefit and leaves the payment columns NULL where no payment exists; filtering on p.payment_id IS NULL isolates the unpaid ones.

A common mistake: putting the date filter in the WHERE clause instead of the ON clause. If you write WHERE p.paid_on >= '2026-06-01', that condition removes the NULL rows (NULL fails the comparison), turning your LEFT JOIN back into an inner join and hiding the unpaid benefits. The date filter must live in the ON clause so the unmatched rows survive.

11 Self-Check

Click each question to reveal the answer.

Q1: Why does reloading the screen not prove a save persisted correctly?

Because the reload re-reads the one table the write landed in — usually the same table the UI wrote to and shows back. It says nothing about the other tables the action was supposed to touch. The Awanui case wrote to customer_profile but not billing_address; the screen looked fine and the bills still went to the wrong address. You must query every affected table directly.

Q2: Why does WHERE status != 'CLOSED' miss some rows, and how do you fix it?

It silently drops rows where status is NULL, because NULL != 'CLOSED' evaluates to unknown, not true. A record with no status set — often a defect itself — vanishes from the result. Fix it with WHERE status != 'CLOSED' OR status IS NULL so the NULL rows are included.

Q3: When would you reach for a LEFT JOIN instead of an inner JOIN?

When you want to find records that are missing a link — a customer with no account, an active benefit with no payment, an order with no settlement. An inner join hides unmatched rows; a LEFT JOIN keeps every left-side row and leaves the right side NULL where there is no match. Filtering on the right-side key being NULL surfaces the gaps.

Q4: How do you assert “exactly one payment per order” in SQL?

Group by the order and count: GROUP BY order_id HAVING COUNT(*) > 1. HAVING filters after grouping, so it can test the count itself. The expected result is zero rows — any order that comes back has more than one payment, which is the double-charge defect.

Q5: What are the first three things to find when reading an unfamiliar schema?

The primary key (the column that uniquely identifies a row), the foreign keys (columns that point at another table’s primary key — the lines you JOIN along), and the specific columns that carry the thing under test, including whether more than one table holds it. Together they give you the map of what changes when the application performs an action.

12 Interview Prep

Real questions asked in NZ QA interviews for data-heavy roles. Read the model answers, then practise your own version.

“How would you verify that an address change in the UI actually saved correctly?”

I would not trust the screen reloading — that just re-reads where the write landed. First I would find out which tables an address change is meant to write to, because in real systems it is often more than one: a profile table the UI shows, and separate tables the billing or notice batches read from. Then I would run a SELECT against each of those tables for that customer, filtering to the record under test, and assert the new address is present in every one. If a downstream table still has the old value, I have caught a defect that the UI would never reveal — exactly the kind that sends a rates notice to the wrong house.

“What is the difference between an inner join and a left join, and when does it matter to a tester?”

An inner join returns only rows that match on both sides; a left join keeps every row from the left table and fills the right with NULL where there is no match. It matters most when I am hunting for things that should be linked but are not — a customer with no account, an order with no payment, a benefit that was never paid. An inner join silently hides those, which is the worst possible behaviour when they are exactly what I am looking for. So I use a left join and filter on the right-side key being NULL to surface the gaps. I am also careful to put any filter on the right table in the ON clause, not the WHERE clause, or it collapses the left join back into an inner join.

“Why is NULL dangerous in test queries, and how do you handle it?”

NULL means ‘no value recorded’, and it does not behave like a real value. It is never equal to anything, so = NULL matches nothing — you have to use IS NULL. And a condition like status != 'CLOSED' silently drops the NULL rows, because NULL != 'CLOSED' is unknown rather than true, so records with a missing status quietly fall out of my results. That is dangerous because those missing-value rows are very often the defect I am paid to find. So I handle NULL deliberately: IS NULL to find absence, an explicit OR ... IS NULL when a negative filter should include them, and I compare COUNT(column) against COUNT(*) to count missing values.