Event-Driven & Async Testing
Kafka and async messaging are the backbone of NZ fintech, healthtech, and government integration platforms. Unlike REST APIs, failures in async systems are invisible — messages go nowhere and nobody knows. This lesson teaches you to test the full pipeline, from producer to consumer to dead letter queue.
1 The Hook
An Auckland KiwiSaver administration platform processes employer payroll contribution events through Apache Kafka. An employer submits payroll on the 20th. The Kafka producer fires successfully — the event lands in the topic and the producer logs a confirmation. The employer's system shows "submitted."
Fourteen days later, during the monthly fund reconciliation, the operations team notices 847 employee contribution records did not update. The Kafka consumer had failed silently on a malformed event caused by a schema version mismatch introduced in a hotfix three days earlier. The consumer threw a deserialization exception, moved the message to a dead letter queue, and continued without alerting anyone. The DLQ was not monitored.
847 employees missed a fortnight of KiwiSaver contributions. The fund administrator spent three weeks in manual remediation, two weeks in customer communications, and filed a Serious Incident Report with the Financial Markets Authority. The root cause was not a complex bug — it was a schema change that was never tested against existing consumers, and a DLQ that was never wired to an alert.
2 The Rule
"In synchronous systems, failure is visible — an API returns 500. In async systems, failure is invisible — the message disappears and nobody knows. Test the whole pipeline: producer, schema, consumer, DLQ, and monitoring."
A test that verifies the producer sent a message and stops there has tested the least important part of the system. The producer sending is table stakes. What matters is whether the right message arrived at the right consumer, was processed correctly, and produced the right side effects — and what happened when it did not.
3 The Analogy
A Kafka pipeline is a postal system, not a phone call.
When you make a phone call, you know immediately if it connected. The conversation is synchronous: you speak, they respond, the interaction is complete. Posting a letter is asynchronous: you hand the letter to the courier (producer), it goes into the postal system (Kafka topic), a delivery person picks it up (consumer), and eventually (if nothing goes wrong) it arrives and is acted on. Testing the phone call means checking the conversation. Testing the postal system means checking every step: was the right address on the envelope? Could the courier read the address? Did the recipient know what to do with the letter? What happened if the letter was unreadable — was it returned, discarded, or silently left in a pile?
4 How Async Systems Differ from REST
Understanding the differences drives every testing decision you make with Kafka and async messaging.
| Dimension | REST / Synchronous | Kafka / Async |
|---|---|---|
| Failure visibility | HTTP 4xx/5xx — immediate, visible | Silent — consumer exception lands in DLQ, caller never notified |
| Timing | Request → response in same thread, same millisecond | Event produced → consumed seconds, minutes, or hours later |
| Coupling | Producer and consumer must be running simultaneously | Producer and consumer are decoupled — consumer can be offline |
| Ordering | Call sequence is deterministic | Only guaranteed within a partition; multi-partition = no global order guarantee |
| Delivery | Exactly-once (one request, one response) | At-least-once by default — consumers must be idempotent |
| Schema versioning | Usually in URL or HTTP Accept header | In Schema Registry (Confluent Avro or JSON Schema) |
Each of these differences is a test dimension. At-least-once delivery means duplicates are possible — you must test that double-processing a KiwiSaver contribution doesn't credit twice. No global ordering means you must test out-of-order arrival. Silent failure means your DLQ monitoring is part of the system under test.
5 Schema Registry and Avro
Kafka messages are bytes. Without a contract, a producer can send anything and consumers silently break. The Confluent Schema Registry enforces a shared contract between producers and consumers, with versioning and compatibility rules.
Avro schema example: KiwiSaver contribution event
Breaking vs non-breaking changes
The Schema Registry enforces compatibility rules. Test both directions when changing a schema:
| Change type | Breaking? | Test scenario |
|---|---|---|
| Add optional field with default | No — backward compatible | Old consumer ignores new field; new consumer uses default if absent |
| Remove required field | Yes | Old consumer fails to deserialize; test old consumer against new schema |
Change field type (string → int) | Yes | Consumer throws ClassCastException; verify DLQ receives the message |
| Add new enum value | Depends on consumer | Old consumer may throw on unknown enum symbol; test explicitly |
| Rename field | Yes | Old consumers read null for the renamed field; test data integrity |
Schema compatibility policy: Set your Schema Registry to BACKWARD_TRANSITIVE compatibility for production topics. This means every new schema version must be readable by all previous consumer versions — the discipline that prevents the KiwiSaver DLQ incident in this lesson's hook.
6 Producer Testing
Producer tests verify that your application publishes the right events, with the right schema, to the right topic, under all conditions.
Using Testcontainers for real Kafka in CI
What to test on the producer side
- Happy path: valid input → correct event on correct topic with correct schema version
- Validation rejection: invalid input → no event produced, appropriate exception thrown
- Partition key: events for the same member always land on the same partition (ordering guarantee)
- Schema registration: producer registers schema on first publish; subsequent publishes use cached schema ID
- Producer failure handling: what happens if Kafka is unavailable — does the producer retry, queue locally, or surface an error to the caller?
7 Consumer Testing
Consumer tests verify that your application processes incoming events correctly — including the invariant that processing the same event twice does not corrupt data.
Idempotency: the most important consumer test
Kafka guarantees at-least-once delivery. This means a consumer may receive the same event more than once — during a rebalance, a consumer restart, or a broker failover. Your consumer must be idempotent: processing the same message twice must produce the same result as processing it once.
Consumer test checklist
| Test scenario | What you are verifying |
|---|---|
| Valid event → correct state change | Core business logic: event processed, database updated correctly |
| Same event twice → no duplication | Idempotency key or upsert logic prevents double-processing |
| Out-of-order events | Consumer handles event N+1 arriving before event N (if partition key is consistent, this should not happen; if not, verify graceful handling) |
| Schema version mismatch | Consumer receives old-version message → should either deserialize with defaults or route to DLQ |
| Malformed payload (poison pill) | Consumer does not crash; message routed to DLQ; offset committed so processing continues |
| Consumer group rebalance | No messages lost or double-processed during rebalance (requires integration test) |
8 Dead Letter Queues and Poison Pill Testing
A dead letter queue (DLQ) is a Kafka topic where messages go when a consumer cannot process them. Without a DLQ, a poison pill message (one the consumer cannot deserialize or process) blocks the entire partition — the consumer cannot advance the offset, and all subsequent messages are stuck.
DLQ test scenarios
kiwisaver.contributions.dlq topic, NOT that the consumer crashes or hangs.9 End-to-End Saga Testing
A saga is a business transaction that spans multiple services via events. Each step publishes an event that triggers the next service. If any step fails, compensating events roll back the preceding steps.
NZ example: IRD tax refund saga
When a taxpayer's return is assessed and a refund is due, a chain of events fires across three services:
- Assessment Service publishes
ReturnAssessedevent (refund: $2,340 NZD) - Payment Service consumes it, initiates bank transfer, publishes
PaymentInitiated - Notification Service consumes it, sends SMS to taxpayer, publishes
NotificationSent - Ledger Service consumes
PaymentInitiated, records the debit, publishesLedgerUpdated
Saga tests must cover both the happy path and compensation flows:
| Test scenario | Expected outcome |
|---|---|
| Happy path: valid return, bank details on file | All 4 events fire in sequence; refund arrives; SMS sent; ledger updated |
| Bank transfer fails (account closed) | Payment Service publishes PaymentFailed; Assessment Service compensates, marks return as "refund-pending-manual-review" |
| Notification Service is down during saga | Payment saga completes; notification retried later; no duplication of payment |
| Ledger Service times out | Retry with backoff; idempotency key prevents double-ledger entry |
| Duplicate ReturnAssessed event | Payment Service recognises duplicate (idempotency key on return ID), discards; no double refund |
10 AsyncAPI: Event Contracts
OpenAPI documents REST APIs. AsyncAPI documents event-driven APIs — what topics exist, what schemas they carry, and which services publish and subscribe. It is the async equivalent of a Pact consumer contract.
AsyncAPI enables:
- Contract-first development: define the event schema before writing producer or consumer code
- Consumer-driven contract tests: consumers publish their expected schema; producers must not introduce breaking changes
- Documentation: auto-generated human-readable docs showing all topics, schemas, and publishers/subscribers
- Schema drift detection: CI compares the AsyncAPI spec against the Schema Registry; fails if they diverge
11 Common Mistakes
✗ Only testing the producer — not the consumer
The producer firing successfully is the least important verification. What matters is whether the consumer received, deserialised, and correctly processed the event. Many teams have 100% producer test coverage and 0% consumer test coverage. Always write consumer tests first.
✗ Not testing for duplicate message processing
Kafka at-least-once delivery is not just a theoretical edge case. Consumer restarts, rebalances, and broker failovers all trigger duplicate delivery. Idempotency must be tested explicitly. A single test processing the same event twice will catch an entire class of production data corruption bugs.
✗ Using an in-memory Kafka mock instead of a real broker
EmbeddedKafka and most Kafka mocks do not reproduce broker behaviours: partition rebalancing, consumer group coordination, schema registry enforcement, or DLQ routing. Use Testcontainers with a real Kafka image in CI. The extra 30 seconds of startup time catches an entire class of bugs that mocks hide.
✗ Never testing what happens when the DLQ fills up
A DLQ that nobody monitors is a ticking clock. Test that your monitoring fires when message count in the DLQ topic crosses a threshold, and that the alert reaches the team within the SLA window. The absence of a DLQ alert is not safety — it is ignorance of failure.
✗ Making breaking schema changes without testing backward compatibility
Renaming a field, removing a field, or changing a field type is a breaking change. Old consumers that have not been updated will fail to deserialise the new schema. Before deploying a schema change, test the new producer schema against the old consumer code — in CI, not in production.
12 Now You Try
Three AI-graded exercises. Apply event-driven testing concepts to NZ scenarios.
Design a DLQ test strategy for the KiwiSaver contribution Kafka pipeline described in the hook. Your strategy must cover:
- What events should be routed to the DLQ, and under what conditions
- What metadata the DLQ message should contain for remediation teams
- How to test that the partition continues processing after a poison pill
- How to test the monitoring alert (what triggers it, who is notified, within what SLA)
- The manual remediation process for DLQ messages (and how to test it)
You are reviewing a pull request that modifies the Avro schema for Te Whatu Ora patient health events. Below is the current (v2) schema and the proposed (v3) schema. Identify every breaking change, explain why it is breaking, and propose a safe migration path for each.
// v2 (current) // v3 (proposed)
{ "name": "patientId", { "name": "patientNHI", ← renamed
"type": "string" } "type": "string" }
{ "name": "nhsNumber", { "name": "nhiNumber", ← renamed
"type": ["null","string"] } "type": "string" } ← null removed
{ "name": "admissionDate", { "name": "admissionDate",
"type": "string" } "type": "long" } ← type changed
{ "name": "ward", ← new field (no default)
"type": "string" }
You are writing tests for the Payment Service in the IRD tax refund saga (described in section 9). The service consumes ReturnAssessed events and initiates bank transfers. Write three test cases:
- Happy path: first-time receipt of a valid event
- Idempotency: same event received twice (simulate Kafka at-least-once delivery)
- Compensation: bank transfer API returns "account closed" — verify the rollback event is published
Use the Given/When/Then format. You do not need to write real code — describe what each step sets up, what action is taken, and what assertions are made.
13 Self-Check
Click each question to reveal the answer.
Why must Kafka consumers be idempotent, and how do you implement idempotency?
Kafka guarantees at-least-once delivery — the same message may be delivered more than once during consumer restarts, rebalances, or broker failovers. If a consumer processes the same event twice without idempotency protection, it may corrupt data (e.g. credit a bank account twice). Implement idempotency using a unique event ID (idempotency key): before processing, check if the event ID has already been processed; if so, skip it. Alternatively, use upsert (INSERT ON CONFLICT DO NOTHING) database operations with the event ID as the primary key.
What is a poison pill message, and what should happen when a consumer receives one?
A poison pill is a message the consumer cannot deserialise or process — typically due to a schema mismatch, corrupt bytes, or an unexpected payload format. Without handling, the consumer cannot advance the partition offset and all subsequent messages are blocked. The consumer should catch the deserialisation exception, route the poison pill to a dead letter queue topic, commit the offset to unblock the partition, and continue processing the next message. The DLQ must be monitored for manual remediation.
What is the difference between at-least-once and exactly-once delivery in Kafka?
At-least-once delivery (the default) means every message is delivered at minimum once, but duplicates are possible. Exactly-once delivery (available with Kafka transactions and idempotent producers) means each message is processed precisely once, even in the event of retries or failures. Exactly-once has higher latency and complexity. Most NZ teams use at-least-once with idempotent consumers — the consumer handles duplicates — rather than paying the performance cost of exactly-once at the broker level.
What is an AsyncAPI document, and how does it relate to schema testing?
AsyncAPI is an open specification (similar to OpenAPI for REST) that documents event-driven APIs: which topics exist, what schema each message uses, and which services are publishers or subscribers. In a CI pipeline, AsyncAPI can be used to validate that the actual schema in the Schema Registry matches the AsyncAPI specification — catching schema drift before it reaches production. It also enables consumer-driven contract tests, where consumers publish their required schema and producers must not break it.
Name three Kafka test scenarios that a Testcontainers-based integration test covers that an in-memory Kafka mock cannot.
1. Consumer group rebalancing — only a real Kafka broker manages consumer group coordination; mocks do not reproduce partition rebalancing behaviour. 2. Schema Registry enforcement — a real Confluent Schema Registry validates Avro schema compatibility; most mocks bypass schema validation entirely. 3. DLQ routing configuration — the error handler that routes poison pills to the DLQ topic is a Kafka Streams or Spring Kafka configuration that only takes effect against a real broker. Mocks typically call the consumer directly, skipping this layer.
14 ISTQB Mapping
ISTQB CTAL-TA (Certified Tester Advanced Level — Test Analyst)
- Chapter 3 — Structure-Based Testing: Integration testing patterns, interface testing, system integration
- Chapter 4 — Defect-Based Testing: Fault injection, error handling verification
ISTQB CTAL-DevOps (Advanced DevOps Tester)
- Chapter 2 — Continuous Testing: Async system test automation, integration testing in CI pipelines
- Chapter 5 — Test Environments: Containerised test environments (Testcontainers), infrastructure-as-code for test brokers
ISTQB CT-TAS (Test Automation Strategies)
- Test isolation patterns, test data management for event-driven systems, contract testing strategies