Database Testing · Lesson 1

SQL for Testers

Q: Why does WHERE status != 'CLOSED' miss some rows, and how do you fix it?

It silently drops rows where status is NULL , because NULL != 'CLOSED' evaluates to unknown, not true. A record with no status set — often a defect itself — vanishes from the result. Fix it with WHERE status != 'CLOSED' OR status IS NULL so the NULL rows are included.

Q: When would you reach for a LEFT JOIN instead of an inner JOIN?

When you want to find records that are missing a link — a customer with no account, an active benefit with no payment, an order with no settlement. An inner join hides unmatched rows; a LEFT JOIN keeps every left-side row and leaves the right side NULL where there is no match. Filtering on the right-side key being NULL surfaces the gaps.

Q: How do you assert “exactly one payment per order” in SQL?

Group by the order and count: GROUP BY order_id HAVING COUNT(*) > 1 . HAVING filters after grouping, so it can test the count itself. The expected result is zero rows — any order that comes back has more than one payment, which is the double-charge defect.

The screen says “saved”. SQL is how you find out whether that is true. This lesson teaches you to query the database directly and assert on what actually persisted.

Database Testing Database & Backend — Lesson 1 of 4 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ council, Awanui District Council, shipped a feature that let residents update their postal address online for rates notices. The tester clicked through it: typed a new address, pressed Save, saw the green “Your details have been updated” banner, and reloaded the page. The new address showed. They marked it passed.

Three weeks later the quarterly rates notices went out. A few hundred of them went to the old addresses. People who had carefully updated their details still got their bill at the house they had moved out of, and some missed the payment deadline because the notice never reached them.

The screen had not lied, exactly. The address update wrote to a customer_profile table that the web page read back from. But the rates-notice batch read from a separate billing_address table, and the update never wrote there. The web page showed the new address because it was reading the same row it had just written. The reload “proved” nothing — it just re-read the one place the data had landed. Nobody had checked the table the bills were actually generated from.

Here is the lesson hidden in that story. The tester confirmed the screen, and the screen confirmed itself. The only way to catch this defect was to step behind the UI and ask the database directly: after I save an address, which tables changed, and did every table the business depends on get the new value? That question is a SELECT. This lesson teaches you to write it.

2 The Rule

The screen renders what one query returned; it does not prove what the database stored. To verify that an action persisted correctly, query the data layer directly and assert on the actual rows — every table the action should have touched, not just the one the UI reads back. A test that only re-reads the screen is testing the screen, not the system.

Senior engineer insight

The hardest thing to convince teams of is that a passing UI test with no database assertion is not a test at all — it is a demo. I worked on a NZ government Oracle system where an entitlement workflow updated three tables on submit; testers had been marking those scenarios passed for two years purely from the screen. When we finally added post-submit SELECT queries, we found a trigger that had silently failed on rollover dates, leaving a status column stale in a downstream table that the payment batch read. The screen had looked right the entire time.

Once you start querying, a second discipline kicks in: you must know which database you are on. A PostgreSQL ILIKE, a SQL Server TOP, or an Oracle ROWNUM are not interchangeable, and a query that compiles on one quietly returns wrong results or errors on another. In Wellington government and banking shops you will hit all three in the same week — know your dialect.

Most common mistake: writing a WHERE clause that silently excludes NULL rows and then wondering why coverage looks low. A test that misses NULL data in a government system is not a conservative test — it is a broken one.

3 The Analogy

Analogy

Changing your address with one government agency and assuming the rest know.

When you move house in NZ, you tell Revenue NZ your new address. The Revenue NZ website confirms it, shows it back to you, and everything looks done. But TransitNZ still has your old address for your licence, the electoral roll still has the old one, and your KiwiSaver provider never heard a thing. One confirmation from one agency told you nothing about the others. You only find out when a renewal notice goes to the wrong house.

A database is the same. The UI confirming a save is one agency confirming. SQL is how you ring each of the other agencies and ask directly: do you have the new address? A tester who reads the database checks every table the change should have reached, instead of trusting one screen to speak for all of them.

4 SELECT, WHERE and ORDER BY — Reading State

A SELECT reads rows from a table. It changes nothing — it is the safe, read-only verb a tester lives in. The shape is always the same: which columns, from which table, matching which condition.

SELECT customer_id, postal_address, updated_at

FROM   billing_address

WHERE  customer_id = 48213

ORDER BY updated_at DESC;

Read it as a sentence: “Give me the customer ID, postal address, and last-updated time, from the billing address table, for customer 48213, newest first.” That single query is the test the Awanui tester never ran. If billing_address still shows the old address after a save, the defect is caught before a single rates notice goes out.

The WHERE clause is where a tester does most of the work. It narrows millions of rows to the one case under test. A few patterns you will use constantly:

Exact match: WHERE status = 'ACTIVE' — the row must equal that value.
Ranges: WHERE created_at >= '2026-04-01' — everything on or after that date.
Sets: WHERE benefit_type IN ('JOBSEEKER','SUPPORTED_LIVING') — any of these.
Negation and combination: WHERE region = 'AUCKLAND' AND status != 'CLOSED'.

ORDER BY ... DESC matters more than it looks. When you save a record twice, or a process writes a history row each time, the newest row is the one that reflects the action you just performed. Sorting newest-first puts the row you care about at the top, so you assert on the right one rather than an old version.

5 JOIN — Following the Links

Real data is spread across tables that point at each other. A customer lives in one table, their KiwiSaver accounts in another, their contributions in a third. A JOIN stitches them back together along the keys that link them, so you can assert across the whole picture in one query.

SELECT c.customer_id, c.full_name, a.account_number, a.balance

FROM   customer c

JOIN   kiwisaver_account a ON a.customer_id = c.customer_id

WHERE  c.customer_id = 48213;

The ON clause is the link: every kiwisaver_account row carries a customer_id that matches a customer. A plain JOIN (an inner join) returns only rows where both sides match — a customer with no account simply will not appear.

That last point is a trap and a tool. If you want to find customers who should have an account but do not, an inner join hides them — they vanish from the result. A LEFT JOIN keeps every row from the left table and fills the right side with NULL where there is no match, so the missing accounts show up as empty. Finding rows where the right side is NULL after a LEFT JOIN is one of the most useful tester moves there is — it surfaces the things that should be linked but are not. You will see this again in Lesson 2 when hunting orphaned rows.

Pro tip: When a save is supposed to write across several tables (the Awanui case), a single LEFT JOIN from the master record out to each dependent table tells you in one query which writes happened and which silently did not. The NULL columns are your missing writes.

6 GROUP BY and COUNT — Asserting on Totals

Some defects are invisible at the single-row level and obvious in aggregate. A duplicate payment looks fine row by row — each row is a valid payment. It is only when you count payments per order that two appear where one should. GROUP BY collapses many rows into one per group, and COUNT tells you how many fell into each.

SELECT order_id, COUNT(*) AS payment_count, SUM(amount) AS total_charged

FROM   payment

WHERE  created_at >= '2026-06-01'

GROUP BY order_id

HAVING COUNT(*) > 1;

Read it: “For each order since 1 June, how many payments and what total — but only show orders with more than one payment.” The HAVING clause filters after grouping (where WHERE filters before), so it can test the count itself. Any row that comes back is an order charged more than once — the double-charge defect, found with a single query instead of a customer complaint.

This is the assertion shape behind most reconciliation. “Every order has exactly one payment” becomes “no order has a count greater than one”. “No duplicate Revenue NZ numbers” becomes “group by ird_number, having count greater than one returns zero rows”. When the expected answer is “zero rows”, a query that returns any rows at all is a failing test.

7 Reading an Unfamiliar Schema

You will constantly be dropped into a database you have never seen, on a system someone else built. Before you can write a useful query you have to read the schema — the map of tables, columns, and the keys that link them.

Three things to find first, every time:

The primary key: the column that uniquely identifies a row in a table — usually an id or <thing>_id. This is the handle you grab a specific record by.
The foreign keys: columns that point at another table’s primary key — a customer_id on a claim table means each claim belongs to a customer. These are the lines you will JOIN along.
The columns that carry the thing under test: for an address change, which column actually holds the address, and is there more than one address table? For a status change, which column is the status, and what are its allowed values?

A schema diagram or the database’s own catalogue (the system tables that list every table and column) gives you this. So does asking a developer to walk you through “where does this action write?” — a five-minute conversation that would have saved Awanui District Council. The goal is a mental map: when the application does X, which tables and columns change, and how are they linked.

Pro tip: When you do not know which table a save writes to, do a before-and-after. Snapshot the relevant tables, perform the action once, then compare. The tables whose rows changed are the ones the action touched — and if a table you expected to change did not, you have found a defect before writing a single targeted query.

8 The NULL Traps

NULL means “no value recorded” — not zero, not an empty string, not false. It is the absence of an answer, and it behaves unlike any real value, which is exactly why it quietly breaks tester queries.

Trap one: NULL is never equal to anything, including NULL. WHERE middle_name = NULL returns nothing — not even rows where the middle name is genuinely missing. You must write WHERE middle_name IS NULL. The = operator cannot test for absence.

Trap two: a condition silently drops NULL rows. WHERE status != 'CLOSED' looks like it returns every non-closed record. It does not return rows where status is NULL, because NULL != 'CLOSED' is not true — it is unknown. A claim with no status set vanishes from your result, and you never test it. To include them you must write WHERE status != 'CLOSED' OR status IS NULL.

Trap three: COUNT and aggregates skip NULLs. COUNT(middle_name) counts only rows where the middle name is present; COUNT(*) counts all rows. If you expect every customer to have an email and COUNT(email) is lower than COUNT(*), you have just found the rows with a missing email — a real data quality defect.

For a tester, NULL is not an annoyance to route around — it is often the defect itself. An Revenue NZ number that should always be present but is NULL for some accounts, a mandatory consent flag left empty, a foreign key that points nowhere: these are exactly the rows you are paid to find. Make NULL-hunting a habit, not an afterthought.

9 Common Mistakes

🚫 Verifying a save by reloading the same screen

Why it happens: The reload re-renders and the new value appears, so it feels like proof.
The fix: The reload re-reads the one place the write landed — the Awanui trap. Query each table the action should have written to, directly, and confirm every one of them carries the new value. The screen cannot tell you about the table it does not read from.

🚫 Using = NULL to find missing values

Why it happens: Equality is how you match every other value, so it seems natural for NULL too.
The fix: = NULL returns nothing, ever. Use IS NULL and IS NOT NULL. NULL is absence, and absence cannot be compared with = — it can only be tested for existence.

🚫 Trusting an inner JOIN to show you missing links

Why it happens: An inner join is the default and returns clean matched rows.
The fix: An inner join hides the records with no match — the very ones you often want. To find a customer with no account, or an order with no payment, use a LEFT JOIN and look for NULL on the right side. The gaps are the defects.

🚫 Asserting on one row when the defect lives in the total

Why it happens: Each individual row is valid, so spot-checking looks fine.
The fix: Duplicates and miscounts are invisible row by row. Use GROUP BY ... HAVING COUNT(*) > 1 to assert “exactly one per group”. When the expected answer is zero rows, any returned row is a failing test.

From the field

On a benefits-payment system running SQL Server, the team assumed their regression suite was solid because every API test returned HTTP 200 and the response body matched the expected JSON. Nobody was querying the database. During a production release, a stored procedure bug caused a status column in a child table to be set to NULL instead of ‘PROCESSED’ when a specific benefit type crossed a financial-year boundary — a path that only exists in August and had never been tested against the raw table.

The API returned 200 because the orchestration layer read from a view that defaulted NULL status to ‘ACTIVE’. Three thousand benefit payments sat unprocessed for four days before a manual reconciliation report flagged the discrepancy. The fix was a one-line SQL patch and a new test: after every processing run, SELECT COUNT(*) FROM benefit_payment WHERE status IS NULL must return zero. That assertion has been running in CI ever since, takes 40 milliseconds, and has caught two further regressions before they reached production.

The lesson generalises: views, ORMs, and API layers can all mask bad data in the underlying tables. The only assertion that cannot be fooled is a direct SELECT against the actual column in the actual table.

10 Now You Try

Three graded exercises: spot the broken query, fix a verification, then build a persistence-proof query. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Bug in the Query

A tester wrote the query below to find every KiwiSaver account that is not closed and has no recorded Revenue NZ number, on a fictional Harbour Bank KiwiSaver database. It returns far fewer rows than expected and misses real problem accounts. Identify every bug and explain why each one drops rows it should return.

    SELECT account_id, customer_id, ird_number, status

    FROM   kiwisaver_account

    WHERE  status != 'CLOSED'

    AND    ird_number = NULL;

List the bugs and the corrected query:

Show model answer

There are two real bugs, and the second is fatal on its own.

Bug 1 — ird_number = NULL never matches anything. = cannot test for absence; NULL is never equal to NULL. So the whole query returns zero rows about missing Revenue NZ numbers, which is the opposite of the goal. Fix: ird_number IS NULL.

Bug 2 — status != 'CLOSED' silently drops rows where status is NULL. NULL != 'CLOSED' is unknown, not true, so any account with no status set is excluded — and an account with a missing status is exactly the kind of broken record you want to see. Fix: (status != 'CLOSED' OR status IS NULL).

Corrected query:
SELECT account_id, customer_id, ird_number, status
FROM   kiwisaver_account
WHERE  (status != 'CLOSED' OR status IS NULL)
AND    ird_number IS NULL;

The lesson: both bugs are NULL traps. = NULL finds nothing, and a plain != filter quietly removes the NULL rows that are often the defects you are hunting.

🔧 Exercise 2 of 3 — Fix the Verification

On a fictional TransitNZ licence-renewal system, a renewal is supposed to write to three tables: licence (new expiry date), payment (the renewal fee), and notification_queue (the confirmation email). A tester “verified” a renewal like this: “I renewed the licence, the screen showed the new expiry date, and I reloaded the licence page — it still showed the new date. Pass.” Rewrite this as a proper persistence verification: describe the queries you would run against each table and the exact assertion for each.

Write the verification — one query and one assertion per table:

Show model answer

licence table — query: SELECT expiry_date, updated_at FROM licence WHERE licence_id = :id;
              assertion: expiry_date equals the new expected expiry (e.g. exactly +1 year from the old one), and updated_at is the time of the renewal.

payment table — query: SELECT amount, status, COUNT(*) FROM payment WHERE licence_id = :id AND created_at >= :renewal_time GROUP BY amount, status;
              assertion: exactly one payment row exists for this renewal, amount equals the renewal fee, status is settled/successful — not zero rows and not two.

notification_queue table — query: SELECT recipient, type, created_at FROM notification_queue WHERE licence_id = :id AND type = 'RENEWAL_CONFIRMATION';
              assertion: exactly one queued confirmation exists, addressed to the correct holder.

Why the original was insufficient: reloading the licence page only re-reads the licence table — the one place the write that the screen shows landed. It proves nothing about payment or notification_queue. The renewal fee could have failed to post and the email could be missing, and the verification would still say "Pass". A real persistence check asserts on every table the action should touch, not just the one the UI reads back. This is the Awanui trap applied to three tables.

🏗️ Exercise 3 of 3 — Build a Persistence-Proof Query

On a fictional Benefits NZ benefit-payment database with tables client(client_id, full_name), benefit(benefit_id, client_id, benefit_type, status), and payment(payment_id, benefit_id, amount, paid_on): write a single SQL query that proves whether every active benefit has been paid at least once this month, and would surface any active benefit that has not been paid. State the result that means “pass”.

Show model answer

SQL query:
SELECT b.benefit_id, b.client_id, b.benefit_type
FROM   benefit b
LEFT JOIN payment p
       ON p.benefit_id = b.benefit_id
       AND p.paid_on >= '2026-06-01'
WHERE  b.status = 'ACTIVE'
AND    p.payment_id IS NULL;

Expected result for a PASS: zero rows. Every row that comes back is an active benefit with no payment this month — an unpaid client, which is a defect.

Which JOIN type and why: a LEFT JOIN, because an inner join would hide exactly the benefits with no matching payment — the ones you are hunting. The LEFT JOIN keeps every active benefit and leaves the payment columns NULL where no payment exists; filtering on p.payment_id IS NULL isolates the unpaid ones.

A common mistake: putting the date filter in the WHERE clause instead of the ON clause. If you write WHERE p.paid_on >= '2026-06-01', that condition removes the NULL rows (NULL fails the comparison), turning your LEFT JOIN back into an inner join and hiding the unpaid benefits. The date filter must live in the ON clause so the unmatched rows survive.

11 Self-Check

Click each question to reveal the answer.

Q1: Why does reloading the screen not prove a save persisted correctly?

Because the reload re-reads the one table the write landed in — usually the same table the UI wrote to and shows back. It says nothing about the other tables the action was supposed to touch. The Awanui case wrote to customer_profile but not billing_address; the screen looked fine and the bills still went to the wrong address. You must query every affected table directly.

Q2: Why does WHERE status != 'CLOSED' miss some rows, and how do you fix it?

It silently drops rows where status is NULL, because NULL != 'CLOSED' evaluates to unknown, not true. A record with no status set — often a defect itself — vanishes from the result. Fix it with WHERE status != 'CLOSED' OR status IS NULL so the NULL rows are included.

Q3: When would you reach for a LEFT JOIN instead of an inner JOIN?

When you want to find records that are missing a link — a customer with no account, an active benefit with no payment, an order with no settlement. An inner join hides unmatched rows; a LEFT JOIN keeps every left-side row and leaves the right side NULL where there is no match. Filtering on the right-side key being NULL surfaces the gaps.

Q4: How do you assert “exactly one payment per order” in SQL?

Group by the order and count: GROUP BY order_id HAVING COUNT(*) > 1. HAVING filters after grouping, so it can test the count itself. The expected result is zero rows — any order that comes back has more than one payment, which is the double-charge defect.

Q5: What are the first three things to find when reading an unfamiliar schema?

The primary key (the column that uniquely identifies a row), the foreign keys (columns that point at another table’s primary key — the lines you JOIN along), and the specific columns that carry the thing under test, including whether more than one table holds it. Together they give you the map of what changes when the application performs an action.

Why teams fail here

Treating a screen reload as a persistence assertion — the UI re-reads one table and the tester calls it done, leaving every downstream table unverified. This is the Awanui trap and it recurs constantly in NZ government systems where a single user action fans out to three or four tables.
Using = NULL and != 'value' without understanding NULL three-valued logic — queries that compile and return results but silently exclude the broken rows that are exactly the defects under test. In Oracle and PostgreSQL the behaviour is identical but easy to forget under pressure.
Defaulting to INNER JOIN for every multi-table query — missing the LEFT JOIN pattern that surfaces absent relationships. An active KiwiSaver account with no contribution record, or an order with no settlement, vanishes from an inner join and looks like clean data.
Checking the right data in the wrong database — NZ banking and government environments commonly run SQL Server for operational data, Oracle for legacy core systems, and PostgreSQL for newer services side by side. Testers who only know one dialect write queries that return wrong results or fail silently on the others, and sign off on incorrect assertions without realising it.

12 Interview Prep

Real questions asked in NZ QA interviews for data-heavy roles. Read the model answers, then practise your own version.

“How would you verify that an address change in the UI actually saved correctly?”

I would not trust the screen reloading — that just re-reads where the write landed. First I would find out which tables an address change is meant to write to, because in real systems it is often more than one: a profile table the UI shows, and separate tables the billing or notice batches read from. Then I would run a SELECT against each of those tables for that customer, filtering to the record under test, and assert the new address is present in every one. If a downstream table still has the old value, I have caught a defect that the UI would never reveal — exactly the kind that sends a rates notice to the wrong house.

“What is the difference between an inner join and a left join, and when does it matter to a tester?”

An inner join returns only rows that match on both sides; a left join keeps every row from the left table and fills the right with NULL where there is no match. It matters most when I am hunting for things that should be linked but are not — a customer with no account, an order with no payment, a benefit that was never paid. An inner join silently hides those, which is the worst possible behaviour when they are exactly what I am looking for. So I use a left join and filter on the right-side key being NULL to surface the gaps. I am also careful to put any filter on the right table in the ON clause, not the WHERE clause, or it collapses the left join back into an inner join.

“Why is NULL dangerous in test queries, and how do you handle it?”

NULL means ‘no value recorded’, and it does not behave like a real value. It is never equal to anything, so = NULL matches nothing — you have to use IS NULL. And a condition like status != 'CLOSED' silently drops the NULL rows, because NULL != 'CLOSED' is unknown rather than true, so records with a missing status quietly fall out of my results. That is dangerous because those missing-value rows are very often the defect I am paid to find. So I handle NULL deliberately: IS NULL to find absence, an explicit OR ... IS NULL when a negative filter should include them, and I compare COUNT(column) against COUNT(*) to count missing values.

Key takeaway

A test that trusts the screen is testing the screen; the only assertion that cannot lie is a SELECT against the actual column in the actual table, run after every action that is supposed to change it.

← Database & Backend Testing Next: Data & Referential Integrity →

SQL for Testers

1 The Hook

2 The Rule

3 The Analogy

4 SELECT, WHERE and ORDER BY — Reading State

5 JOIN — Following the Links

6 GROUP BY and COUNT — Asserting on Totals

7 Reading an Unfamiliar Schema

8 The NULL Traps

9 Common Mistakes

10 Now You Try

11 Self-Check

Related techniques

12 Interview Prep