Hybrid · Behaviour + Partial Internals

Grey-Box Testing

Test through the UI like a real user, but design your cases — and check your results — using partial knowledge of what sits underneath: the database schema, the API contracts, the architecture, the logs. You see enough of the inside to test the right things, and to know whether the system actually did them.

Junior Senior ISTQB CTFL v4.0 — 4.1 Test techniques overview

1 The Hook

A tester at a Kiwibank-style retail bank runs a transfer test. They log in, move $250 from the everyday account to the savings account, and the screen says “Transfer successful.” Green tick. Pass. They move on.

Two weeks later, reconciliation fails. The $250 left the everyday account but never arrived in savings — the credit leg of the transaction silently rolled back, yet the UI had already optimistically shown success. The money sat in limbo. A pure black-box test, watching only the screen, could never have caught it: the screen lied.

A grey-box tester would have done one more thing. After the “success” message, they would have run a query against the ledger table and confirmed two rows: a debit of $250 against the source account and a matching credit of $250 against the destination, both tied to the same transaction reference, both committed. The message is the claim; the ledger row is the proof. Knowing the schema turns a hopeful pass into a verified one.

2 The Rule

Drive the system the way a user does, but design and verify with partial knowledge of the inside — check the database row, the API response, and the log line, not just the message on the screen.

3 The Analogy

Analogy

A restaurant health inspector with a kitchen pass.

A food critic (black box) only ever tastes the plate that arrives at the table. A line cook (white box) knows every recipe and every pot. The health inspector sits between them: they order from the menu like any diner, but they also hold a pass that lets them step into the kitchen, open the fridge, and read the temperature log. They do not rebuild the kitchen — they just check that the fridge is actually at 4°C and that the chicken was logged as cooked to temperature.

Grey-box testing is the inspector. You experience the system as a customer through the UI, but you carry a pass into the database, the API layer, and the logs — just enough internal access to confirm the meal was made the way the menu claimed, not to cook it yourself.

What it is

Grey-box testing (also spelled gray-box) is a hybrid. You test the application’s behaviour from the outside — through the UI or the public API — the way black-box techniques do, but you bring partial knowledge of the internals to the table. You might know the database schema, the API contracts, the message queues, the service architecture, or where the logs land. You do not have, or do not need, full source-code access the way white-box techniques require.

That partial knowledge does two jobs. First, it helps you design better cases — you know which fields map to which columns, which endpoints fire behind a button, and where a transaction spans two services. Second, it lets you verify internal state — you confirm the system genuinely did what it claimed, rather than trusting a UI message that might be wrong.

Where it sits between black and white box

The three approaches are best understood as a spectrum of how much of the inside you can see:

  • Black box: you know the specification and the inputs/outputs. You judge purely by observable behaviour. No internal knowledge.
  • Grey box: you know the specification and some of the structure — schema, contracts, architecture, logs. You test through the outside but verify on the inside.
  • White box: you have the source and design tests against the code structure — statements, branches, paths. Coverage is measured against the code itself.

Grey box is not “black box plus a peek at the code.” It is a distinct stance: you keep the user’s viewpoint as your test driver, and you use internal knowledge only as a design aid and an oracle for checking results.

Sources of internal knowledge

What counts as the “grey” in grey box? In practice it is one or more of these:

  • Database schema: tables, columns, constraints, and relationships. Lets you assert that a row was created, updated, or correctly left alone.
  • API contracts: request/response shapes, status codes, error formats (often an OpenAPI or similar spec). Lets you trace which calls a UI action triggers and assert on the payloads.
  • Architecture: which services, queues, and caches the request flows through. Lets you find the gaps between components where data can get lost.
  • Logs and traces: application logs, audit trails, distributed traces. Lets you confirm an event was recorded, an error was handled, and a request reached the service you expected.

Verifying state, not just the screen

The defining habit of a grey-box tester is the extra assertion after the visible result. The UI says one thing; you go and check whether the system agrees with itself underneath. Three common moves:

  • Database asserts: after an action, run a query to confirm the expected rows exist with the expected values — and that nothing unexpected changed.
  • Log checks: confirm the action wrote the audit or event log you expected, with the right user, timestamp, and outcome.
  • API tracing: watch the network or service calls behind a UI action to confirm the right endpoint was hit with the right payload and returned the right status.

Real-world NZ Example: a Kiwibank-style ledger transfer

You transfer $250 between two of your own accounts and the screen shows “Transfer successful.” A black-box test stops there. A grey-box test continues:

  • DB assert: query the ledger table for the transaction reference — expect a debit row of −$250 on the source account and a credit row of +$250 on the destination, both committed and both linked to the same reference.
  • Balance check: confirm the source balance dropped by exactly $250 and the destination rose by exactly $250, so the totals still reconcile.
  • Log check: confirm an audit entry recorded the transfer with the correct user, amount, and timestamp.

The bug to fear is a half-committed transaction: money leaves one account but the credit leg fails and the UI still says success. Only the ledger row, not the screen, proves the transfer truly happened.

Worked example

A NZ retailer’s checkout applies a 10% loyalty discount when a signed-in member completes an order. You place an order through the UI and the confirmation page shows the discounted total. Here is the grey-box test set, layering an internal assertion onto each outside-in step.

Loyalty discount checkout — grey-box test cases
UI action (black box)Internal check (grey box)Expected
Add $100 of items, apply member discountQuery order_lines sum + orders.discount columnDiscount = $10, total = $90
Confirmation page shows “Order placed”Assert one row in orders with status CONFIRMEDExactly one confirmed row
Click “Pay now”Trace the call to POST /payments and its 2xx responsePayment endpoint returns 201
Apply discount as a signed-out guestAssert orders.discount = 0 and a denied entry in the logNo discount; denial logged

Why each row needs both columns: the UI column tells you what a user would see; the internal column tells you what the system actually recorded. A discount that displays correctly but never reaches the orders.discount column will under-charge or over-charge once the visible figure and the stored figure drift apart. Grey box catches that drift.

ISTQB mapping

ISTQB groups test techniques into black-box, white-box, and experience-based families. Grey box is not a separate syllabus category — it is a practical approach that combines black-box test design with structural knowledge used as an oracle. The mapping below shows the pieces a grey-box tester draws on.

ISTQB CTFL v4.0 reference
Syllabus refTopicLevel
4.1Test techniques overview (black-box, white-box, experience-based)CTFL Foundation
4.2Black-box techniques — the design basis for grey-box casesCTFL Foundation
4.3White-box techniques — the structural knowledge a grey-box tester borrowsCTFL Foundation

Common mistakes

✗ Trusting the UI message as proof

A “success” toast is a claim, not evidence. If you stop at the screen you are doing black box, not grey box. Add the database or log assertion that confirms the system actually did it.

✗ Drifting into white-box testing

Grey box uses partial knowledge to design and verify. The moment you start writing tests against private functions and chasing code coverage, you have crossed into white box — a different goal with different tooling.

✗ Asserting on internals that are not contracted

If you assert on a column or log format that is purely incidental and may change without notice, your tests become brittle. Anchor internal checks to stable contracts — documented schema, published API responses, defined audit events.

✗ Checking only what changed, not what should not have

A transfer that credits the right account but also touches an unrelated row is still a bug. Assert the expected change and that nothing else moved — especially balances, statuses, and totals that must reconcile.

✗ Letting test data and prod-like state collide

Querying a shared database to verify state can give false passes if another run left rows behind. Scope your asserts to the transaction reference or test account you created, not to a broad table count.

4 Now You Try

Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot: classify the approach

For each of the four test descriptions below, say whether it is black box, grey box, or white box, and give a one-line reason. (1) Place an order and confirm the total on screen. (2) Place an order, then query the orders table to confirm the stored discount matches the displayed one. (3) Write a unit test that forces an exception branch inside the discount calculator. (4) Submit a transfer, then check the audit log recorded the right amount and user.

Show model answer
1. On-screen total only — BLACK BOX. Judged purely by observable UI output, no internal knowledge used.

2. Order + DB query for the stored discount — GREY BOX. Driven through the UI like a user, but verified with knowledge of the schema (the orders table / discount column).

3. Unit test forcing an exception branch — WHITE BOX. Designed against the code structure to exercise a specific branch; needs source access and measures coverage.

4. Transfer + audit-log check — GREY BOX. The action is outside-in, but the verification uses internal knowledge of where and how the event is logged.

The dividing line: grey box drives from the outside and verifies on the inside; white box designs the test from the code structure itself.
🔧 Exercise 2 of 3 — Fix: repair a weak transfer test

A tester wrote the test below for a Kiwibank-style $250 transfer. It is too weak: it trusts the UI, verifies only one side of the ledger, and never checks that totals reconcile. Rewrite it as a proper grey-box test with the internal assertions a senior would expect.

Weak test:
1. Transfer $250 from everyday to savings.
2. Confirm the screen says “Transfer successful”.
3. Query the savings account for a +$250 credit row.
4. Pass if the credit row exists.

Rewrite as a proper grey-box test:

Show model answer
Proper grey-box test for a $250 transfer:

Step 1: Record the starting balances of both accounts, then transfer $250 from everyday to savings through the UI.
Step 2: Query the ledger for the transaction reference and assert BOTH legs: a debit row of −$250 on the everyday account AND a credit row of +$250 on the savings account, both committed and sharing the same reference.
Step 3: Assert the totals reconcile — everyday dropped by exactly $250, savings rose by exactly $250, and no other account moved.
Step 4: Confirm an audit-log entry recorded the transfer with the correct user, amount, and timestamp. Only then treat the on-screen "success" message as confirmed.

What was missing in the original:
- It trusted the UI "success" message instead of treating it as a claim.
- It checked only the credit leg — a half-committed transaction (money left but never arrived) would still pass.
- It never confirmed the totals reconcile, so it could not catch money landing in the wrong place or being double-counted.
🏗️ Exercise 3 of 3 — Build: design grey-box checks for a webhook flow

A NZ SaaS billing system sends an invoice email after a customer upgrades their plan in the UI. Behind the scenes it calls POST /subscriptions, writes a row to the subscriptions table, and fires a webhook to an email service that logs a EMAIL_SENT event. Design a grey-box test: name the outside-in action and the internal assertions across the API, database, and logs. Note what you would check that the UI alone cannot show.

Show model answer
Grey-box test for the plan-upgrade billing flow:

Outside-in action: Sign in as a customer and upgrade the plan through the UI; the confirmation page shows the new plan.

API assertion: Trace the POST /subscriptions call behind the upgrade button — assert it was sent with the correct plan id and customer id, and returned a 201 with the expected response body.

Database assertion: Query the subscriptions table for the customer — assert exactly one active row with the new plan, the correct effective date, and the previous plan correctly closed off (no two active rows).

Log / webhook assertion: Confirm the email service logged an EMAIL_SENT event tied to this customer and invoice, proving the webhook fired and was accepted — not just queued.

What the UI alone cannot show: whether the subscription row was actually written (the UI can show success optimistically), whether the old plan was closed cleanly, and whether the invoice email webhook actually reached the email service. A senior would also assert that no duplicate subscription row or duplicate EMAIL_SENT event was created on a retry.

Self-Check

Click each question to reveal the answer.

Q1: In one sentence, what makes grey-box testing different from black-box testing?

Both drive the system from the outside like a user, but grey box adds partial knowledge of the internals — schema, API contracts, logs — to design smarter cases and to verify internal state rather than trusting the UI message alone.

Q2: Name three sources of internal knowledge a grey-box tester typically uses.

Any three of: the database schema (tables, columns, constraints), API contracts (request/response shapes and status codes), the architecture (services, queues, caches), and logs or traces (audit trails, distributed traces). These are partial, contracted views — not full source access.

Q3: A transfer screen says “success” but only the debit leg committed. Which approach catches this, and how?

Grey box. After the UI message you query the ledger for the transaction reference and assert both legs — the debit and the matching credit — plus that the totals reconcile. The screen lied; the database row is the proof.

Q4: When does a grey-box test slip into being a white-box test?

When you stop driving from the outside and start designing tests against the code structure itself — targeting private functions, specific branches, or statement/path coverage. At that point you need full source access and your goal is structural coverage, not behaviour verification.

Q5: Why should internal assertions be anchored to contracts rather than incidental details?

Because asserting on an undocumented column name or a log format that may change without notice makes tests brittle — they break on harmless internal refactors. Anchoring to stable, documented schema, published API responses, or defined audit events keeps the checks meaningful and durable.

Interview Prep

“What is grey-box testing, and when would you choose it over black box?”

Grey box drives the system from the outside like a user but uses partial knowledge of the internals — schema, API contracts, logs — to design cases and verify state. I choose it whenever the visible result and the stored result can diverge: payments, ledgers, subscriptions, stock, audit trails. A black-box test trusts the “success” message; grey box confirms the database row, the API response, and the log line agree with it.

“Give a concrete example of an internal assertion you would add to a UI test.”

For a bank transfer, after the “Transfer successful” message I query the ledger for the transaction reference and assert both legs — the debit on the source and the matching credit on the destination — plus that the totals reconcile and an audit entry was written. That catches a half-committed transaction where money leaves one account but never arrives, which a screen-only test cannot see.

“How do you keep grey-box tests from becoming brittle?”

Anchor internal assertions to stable contracts — documented schema, published API responses, defined audit events — not incidental column names or log formats that may change on a refactor. I also scope database checks to the specific transaction reference or test account I created, so a leftover row from another run cannot cause a false pass or fail.

Grey box builds on black-box techniques for case design and borrows from white-box coverage for structural knowledge — sitting deliberately between the two.

The internal checks themselves draw on API testing for tracing calls and asserting on payloads, and on database testing for writing the queries that verify rows and reconcile totals.

Where grey box shines: any flow where the visible result and the stored result can diverge — payments and ledgers, subscriptions, stock levels, audit trails, and multi-service transactions. If a “success” message could be wrong, grey box is the technique that proves it.