20 min read · 9 self-checks · Updated June 2026

Structural / Integration · CTAL-TA

API Mocking & Stubbing

Replace external APIs with controlled mocks so tests run fast, offline, and without dependencies. Test error scenarios (500s, timeouts) that are difficult to trigger on real services. A core technique for test isolation and speed.

Senior ISTQB CTAL-TA

1 The Hook

A team building a RealMe-linked onboarding flow needs to test what happens when the identity-verification service is slow or down. They cannot make the real RealMe service return a 500 on demand — it is a government service that mostly works, and deliberately breaking it is not on the table. So they skip those tests. “We’ll deal with outages if they happen.”

They happen. One morning the identity service times out under load. The onboarding screen, never tested against a slow upstream, hangs forever on a blank spinner. Users tap retry, which fires duplicate verification requests, which makes the overload worse. The one scenario nobody could reproduce in testing was the one that took the product down.

The problem was never the code — it was the dependency. You cannot reliably make a real external service fail, time out, or return malformed JSON when you want it to. So you replace it with a stand-in you control: a mock that returns exactly the 500, the 10-second delay, or the broken payload you need, on demand, every time. That is the entire reason mocking exists.

💬

Senior Engineer Insight

The failure mode nobody warns you about: mocks don't break, they lie. Your team writes a WireMock stub when the feature ships, every pipeline run goes green for six months, and then production silently breaks because Windcave quietly changed a response field. The mock never noticed. I have seen this exact pattern on three separate NZ payments integrations — green mocks, broken production, embarrassed go-live. The fix is not better mocks; it is treating verify() calls as mandatory, not optional, and running even a single real API call nightly so drift surfaces before the release weekend. Most teams configure the stubs and forget the verification half entirely. That is the half that actually catches regressions.

Senior engineer insight

The real value of mocking is not what it tests — it is what it forces you to think about. Every time you set up a stub, you have to decide: what should my code actually do if the upstream returns this? Teams that skip mocking do not just skip the test; they skip the design conversation entirely. I have seen NZ government API integrations go live with zero error-handling because nobody ever asked “what happens if RealMe returns a 503?” — not because they forgot, but because they never needed to answer the question to get a green build. Mocking makes you answer it before you ship.

Most common mistake: treating mocking as a testing convenience rather than a design tool — setting up a single happy-path stub, ticking the “covered” box, and never interrogating how the code behaves under failure.

From the field

On a Wellington microservices project, three teams each owned a different service: frontend, booking engine, and a legacy accommodation-availability backend. The availability backend was notoriously unreliable in the shared dev environment — it would go down mid-morning almost daily, blocking the other two teams. Instead of fighting the environment, the booking team introduced WireMock as a permanent fixture in their pipeline, mocking all calls to the availability service. Within a week they had stubs for every response shape in the API docs, plus a timeout and a 503 scenario they had been wanting to test for months. The lesson that generalises: shared dev environments are a coordination tax — the teams that eliminated the dependency entirely through mocking shipped faster and had cleaner error-handling than those that kept waiting for the shared environment to stabilise.

2 The Rule

Replace an external dependency you do not control with a test double you do — so you can make it succeed, fail, stall, or return rubbish on demand, and test how your code copes.

3 The Analogy

Analogy

A flight simulator for pilots.

An Pacific Air pilot does not learn to handle an engine failure by waiting for a real engine to fail mid-flight over the Tasman. They train in a simulator, where an instructor can trigger an engine fire, a hydraulics loss, or a sudden crosswind at the press of a button — safely, repeatably, as often as needed.

A mock is the simulator for your code. The real upstream service (the real aircraft) mostly behaves, and you cannot order it to catch fire on cue. The mock lets you stage the engine fire — the 500, the timeout, the malformed response — so your code can rehearse its emergency procedures before it meets the real thing in production.

What is API mocking & stubbing?

Mocking and stubbing are related techniques for replacing real external APIs with test doubles:

Stub: A fake implementation that returns predetermined responses. Used to isolate code under test from external dependencies.
Mock: A stub that also records how it was called, allowing you to verify the calling code interacted with it correctly. Used to check behavior.

In practice, the terms are often used interchangeably. The key benefit: your tests run without making actual network calls, without waiting for real services, and without needing test data on external systems.

Why mock APIs? Real API calls are slow (100ms-1s), flaky (network failures, rate limits), and hard to test. Error scenarios (API returns 500, times out, returns malformed JSON) are expensive to set up and risky to trigger on production systems. Mocks solve all three problems.

Common use cases

Testing without external dependencies: Your code calls a payment processor (Stripe, PayPal). You can't call the real API in tests. Mock it.
Third-party API unavailability: An API is down for maintenance, or you're developing offline. Tests need to continue. Use mocks.
Testing error scenarios: What happens if the payment API returns 500? What if the request times out? Mocks let you return these errors reliably.
Developing faster: No backend implemented yet. Stub the API so frontend tests and development can proceed in parallel.
Performance and load testing: Real APIs have rate limits and can't handle thousands of concurrent requests. Mocks can.
Data isolation: You don't want test data polluting real systems. Mocks ensure no writes to production databases.

Tools for API mocking

API mocking tools — capabilities and fit

Tool	Type	Key strength	Best for
WireMock	Java library + standalone server	Powerful matching (URL, headers, body), request verification, stubbing API, widely used in enterprises	Integration tests, functional tests with external API dependencies, teams using JVM languages
Mountebank	Standalone server (JavaScript/Node)	Protocol-agnostic (HTTP, HTTPS, TCP, SMTP), imposters (multiple mocks), cross-platform	Multi-protocol testing, CI/CD pipelines, teams wanting lightweight open source
Prism	Standalone server (JavaScript/Node)	Reads OpenAPI/Swagger specs, generates mocks automatically, example-based responses	Frontend teams, contract-driven development, generating mocks before backend is ready
json-server	CLI tool (Node.js)	Minimal setup, watches JSON file, CRUD operations, perfect for learning	Quick prototypes, small projects, developers wanting zero configuration

Mocking approaches: record/playback, rule-based, stateful

Record/playback

Capture real API responses, then replay them in tests. Useful when you already have a real API and want to record example responses.

Tradeoff: If the real API response changes, your recorded response is stale. You need to re-record periodically.

Rule-based mocking

Define rules: "if you see a request with URL matching /orders/[0-9]+, return this response." Most flexible and recommended for testing.

Stateful mocking

The mock remembers state across requests. "POST /orders creates an order, GET /orders/123 returns the order you just created, DELETE /orders/123 removes it." Simulates real API behavior more closely.

Setting up mocks: request matching and response definition

Basic request matching

A mock needs to know: when you see this request, return that response. Matching can be based on:

URL path and method: POST /api/payments
Query parameters: GET /api/users?status=active
Request headers: Authorization: Bearer token123
Request body: {"email": "test@example.com", ...}
Regular expressions or wildcards: match a range of requests

Defining responses

Once a request matches, the mock returns a response:

// WireMock example
stubFor(post(urlEqualTo("/api/payments"))
  .withHeader("Content-Type", containing("application/json"))
  .withRequestBody(matchingJsonPath("$.amount", greaterThan(0)))
  .willReturn(aResponse()
    .withStatus(200)
    .withHeader("Content-Type", "application/json")
    .withBody("{\"transactionId\": \"tx-123\", \"status\": \"approved\"}"))
);

Dynamic responses

Response body can include values from the request:

// Mountebank example: echo back the email from request
{
  "stubs": [{
    "predicates": [{"equals": {"path": "/api/users"}}],
    "responses": [{
      "is": {
        "statusCode": 200,
        "body": "{\"id\": 1, \"email\": \"${email}\"}"
      }
    }]
  }]
}

Error scenario testing: 400, 401, 403, 500

The key value of mocking: test what happens when APIs fail.

4xx errors (client errors)

400 Bad Request: Invalid input. Test that your code validates inputs before calling the API and handles the error gracefully.

401 Unauthorized: Missing or invalid authentication. Test that your code re-authenticates or prompts the user to log in again.

403 Forbidden: User lacks permission. Test that your code shows a helpful error message, not a blank screen.

// Test: invalid order amount returns 400
stubFor(post(urlEqualTo("/api/orders"))
  .withRequestBody(matchingJsonPath("$.amount", lessThan(0)))
  .willReturn(aResponse()
    .withStatus(400)
    .withBody("{\"error\": \"Amount must be positive\"}")));

// Your test
expect(() => {
  chargeOrder({amount: -10});
}).toThrowError("Amount must be positive");

5xx errors (server errors)

500 Internal Server Error: Something broke on the server. Test that your code retries, logs the error, and either falls back or shows a user-friendly error.

503 Service Unavailable: Server is down for maintenance. Test graceful degradation.

Timeout and connection failures

Mock timeouts by delaying the response:

// Mountebank: delay response by 5 seconds
{
  "stubs": [{
    "responses": [{
      "is": {...},
      "wait": 5000
    }]
  }]
}

State management: sequential and conditional responses

Sequential responses

Return different responses on successive calls to the same URL. Useful for simulating pagination or polling.

// First call returns status="pending", second returns status="complete"
stubFor(post(urlEqualTo("/api/jobs/check"))
  .inScenario("job-completion")
  .whenScenarioStateIs(Scenario.STARTED)
  .willReturn(aResponse().withBody("{\"status\": \"pending\"}"))
  .willSetStateTo("complete"));

stubFor(post(urlEqualTo("/api/jobs/check"))
  .inScenario("job-completion")
  .whenScenarioStateIs("complete")
  .willReturn(aResponse().withBody("{\"status\": \"complete\"}")));

Conditional responses

Return different responses based on request content:

// If user is admin, return all data. Otherwise, return filtered data.
stubFor(get(urlEqualTo("/api/users"))
  .withQueryParam("role", equalTo("admin"))
  .willReturn(aResponse().withBody("{\"users\": [...]}")));

stubFor(get(urlEqualTo("/api/users"))
  .willReturn(aResponse().withBody("{\"users\": [{...limited...}]}")));

Resetting state between tests

If mocks are stateful, reset them before each test:

beforeEach(() => {
  WireMock.reset();
  // Also reset scenario state if using scenarios
  WireMock.resetAllScenarios();
});

Verification: checking requests were made correctly

Beyond stubbing responses, mocks can verify the calling code made the right requests.

Verify a request was made

// Verify that chargeCard called the payment API
verify(post(urlEqualTo("/api/payments")));

// Verify with specific body content
verify(post(urlEqualTo("/api/payments"))
  .withRequestBody(matchingJsonPath("$.amount", equalTo(50))));

Verify call count and ordering

// Verify retry logic: API called 3 times
verify(post(urlEqualTo("/api/payment")).count(3));

// Verify ordering: auth call before payment call
verify(post(urlEqualTo("/api/auth")), anyRequestedFor(post(urlEqualTo("/api/payment"))));

Worked example: mocking a payment processor

Scenario: Your checkout flow calls a payment API (Stripe). You want tests to verify correct retry behavior on failure.

// Test setup with WireMock
@BeforeEach
void setup() {
  WireMock.reset();
}

@Test
void chargeOrderRetries_onServerError() {
  // First call returns 500
  stubFor(post(urlEqualTo("/api/charge"))
    .inScenario("retry-test")
    .whenScenarioStateIs(Scenario.STARTED)
    .willReturn(aResponse().withStatus(500))
    .willSetStateTo("retry"));

  // Second call returns 200
  stubFor(post(urlEqualTo("/api/charge"))
    .inScenario("retry-test")
    .whenScenarioStateIs("retry")
    .willReturn(aResponse()
      .withStatus(200)
      .withBody("{\"transactionId\": \"tx-456\", \"status\": \"approved\"}")));

  // Execute: chargeOrder should retry and succeed
  Order order = chargeOrder(100);
  assertEquals("approved", order.paymentStatus);

  // Verify: payment endpoint was called twice
  verify(post(urlEqualTo("/api/charge")).count(2));
}

@Test
void chargeOrder_handlesTimeout() {
  // Mock timeout by delaying response
  stubFor(post(urlEqualTo("/api/charge"))
    .willReturn(aResponse()
      .withFixedDelay(10000)  // 10 seconds
      .withBody("{\"status\": \"timeout\"}")));

  // Expect code to throw TimeoutException
  assertThrows(TimeoutException.class, () -> {
    chargeOrder(100);
  });
}

Integration with test frameworks

Most tools integrate seamlessly with test runners:

WireMock: JUnit 4/5 extension, Testcontainers support for Docker integration
Mountebank: Can start/stop via CLI in test setup, Docker container support
Prism: CLI-based, start in CI/CD before test suite runs

Context guide

How the right level of API mocking and stubbing effort changes based on project context.

Context	Priority	Why
NZ government API integration (Revenue NZ, RealMe, CoverNZ, Benefits NZ)	Essential	Sandboxes rarely return realistic failure modes. Mocking is the only way to test 401 token expiry, 503 maintenance windows, and malformed payloads that these services genuinely produce in production.
Payment gateway integration (Windcave, Stripe, Fiserv)	Essential	Real API calls in tests cost money, risk charging real cards, and cannot reliably produce declined-card or timeout scenarios. Mock every error path that the gateway contract documents.
Microservices with shared dev environment (e.g. Spark or Harbour Bank internal services)	High use	Shared environments introduce coordination tax — one downstream service going down blocks all teams. WireMock stubs remove the dependency so teams can develop and test in parallel without waiting for upstream stability.
High-volume CI/CD pipeline (e.g. CloudBooks, ListRight internal services)	High use	Real API round-trips add 100 ms–1 s per call; at hundreds of test runs per day this destroys pipeline throughput. Mocks keep integration tests in the millisecond range and prevent rate-limiting from external providers.
Internal service with a reliable, fully-featured sandbox (e.g. TransitNZ test environment)	Medium	Use the real sandbox for happy-path and standard error paths where it supports them; add targeted mocks only for scenarios the sandbox cannot produce on demand. Avoids maintaining a duplicate you must keep in sync.
End-to-end smoke tests run before or after a production release	Low	The purpose of a production smoke test is to confirm the live integration works. Using a mock here defeats the point entirely — you need the real service to prove the deployment is healthy.

Trade-offs

What you gain and what you give up when you choose API mocking and stubbing.

Advantage	Disadvantage	Use instead when…
Full control over every response — trigger 500s, timeouts, and malformed payloads on demand. Error paths that are impossible to reproduce with a real service become fully testable.	Mock drift: the real API changes a field, adds a required header, or deprecates an endpoint — and every mock-based test keeps passing. You discover the gap in production, not in CI.	The downstream service has a fully-featured sandbox that supports realistic failure modes — use the real thing to avoid the maintenance burden of keeping a mock in sync.
Speed and isolation. Mocks run in milliseconds with no network, no rate limits, and no cost per call — enabling hundreds of integration test runs per day without throttling or charge accumulation.	Optimistic mocks give false confidence. Developers often stub only the happy path, leaving error-handling code completely untested. Passing mock suites can coexist with broken production error handling for months.	You are running pre-release or post-release smoke tests — the point is to confirm the live integration is healthy, which a mock cannot do by definition.
Parallel development. Frontend and other consumers can build and test against a mock before the real backend exists, removing the sequential dependency that typically blocks team velocity.	Maintenance overhead. Mocks are code — they need updating when the real API evolves, they need versioning, and they need contract tests to catch drift. On large codebases the mock suite can become a second codebase to maintain.	The behaviour you want to verify is inside the third-party service itself (e.g. whether Windcave correctly declines an expired card) — a mock tests your code's reaction to responses, not the gateway's internal logic.
No side effects. Mock-based tests never write to production databases, send real emails, charge real payment instruments, or trigger SMS notifications — making them safe to run in any environment at any time.	Mocking the wrong boundary. Stubbing your own database layer or internal services tests the mock instead of your code, couples tests to implementation details, and hides real serialisation and transaction bugs.	The mock setup is becoming more complex than the code under test — a 300-line stateful stub for a 20-line function signals the test boundary or the architecture is wrong and needs revisiting before adding more mocks.

Enterprise reality

How API mocking changes when 200–300 developers across 10+ squads are shipping in parallel in a NZ enterprise.

Mock libraries and stub configurations are centralised in a shared test-infrastructure repository — no squad owns them independently. At Revenue NZ, where multiple programme teams integrate against the same tax APIs, a single team owning and versioning the WireMock stub library prevents 12 squads each hand-crafting incompatible fakes of the same endpoint, which is the default failure mode at this scale.
Privacy Act 2020 and NZISM requirements mean production-like response payloads used in mocks must be fully synthetic — no sampled or anonymised real customer data, even in a test harness. On Harbour Bank programmes this is enforced via automated PII scanning on any fixture file committed to the stub repository; a real Revenue NZ number or bank account number in a mock response payload is a reportable incident.
Contract testing (Pact broker) is mandatory, not optional. With 10+ squads each maintaining stubs for shared internal services, mock drift is guaranteed without a formal consumer-driven contract pipeline. The Pact broker becomes the source of truth for what each team's stub must return — any stub that violates the published contract fails the pipeline before it reaches integration testing.
WireMock and Mountebank instances run as persistent shared services in the lower environments rather than being spun up per-test. At CloudBooks's Auckland engineering organisation this pattern cuts pipeline initialisation overhead from 30–60 seconds per suite to under 2 seconds, which matters when 200 developers are each triggering 5–10 pipeline runs per day. A dedicated platform engineering team owns availability, versioning, and stub promotion across dev, SIT, and UAT.

◆ What I would do

Professional judgment — when to reach for API mocking and stubbing, when to skip it, and what to watch for.

Situation

Testing an Benefits NZ Benefits Online portal that calls the Revenue NZ income-verification API. The Revenue NZ sandbox exists but returns only a success response — it cannot be driven into a 503, a token-expiry 401, or a malformed payload.

I would

Set up a WireMock stub server for the Revenue NZ API and create four stubs: success (200 with income payload), token expiry (401 so I can verify the portal re-authenticates automatically), maintenance window (503 so I can verify graceful degradation and a user-friendly message), and timeout (5-second delay to verify the portal does not hang the user). I would run these in every CI build, then run a single real sandbox call nightly to confirm the happy-path response shape has not drifted. This gives complete coverage of the failure modes that will genuinely affect Kiwis using the portal — modes the sandbox can never deliver on demand.

Situation

A TeleNZ billing microservice where a developer has already written WireMock stubs for all downstream calls. The stubs only return 200. I have been asked to review the test coverage before the sprint ends.

I would

Treat the existing stubs as a starting point, not a finish line. I would audit the downstream API contracts for every response code documented — typically 400, 401, 403, 422, 429, 500, and 503 — and add a stub and an assertion for each. I would also add a verify() call for every critical outbound request to confirm the billing service is sending the correct amount, account number, and idempotency key. The developer's stubs prove the happy path works; my additions prove the error paths do too. I would flag the gap in a code review comment rather than quietly fixing it, so the team builds the habit of stubbing error paths from the start rather than treating them as optional extras.

Situation

An Pacific Air ancillaries team asks whether they should mock the Windcave payment gateway in their end-to-end regression suite that runs against the production environment two hours before each go-live.

I would

Not mock here — and I would explain why. The purpose of this suite is to verify that the live deployment is healthy and that the real Windcave integration is responding correctly after the release. A mock would prove the application code is alive but would tell you nothing about whether the gateway credentials rotated correctly, whether the new TLS certificate is trusted, or whether a Windcave maintenance window coincides with the release. Use Windcave's test card numbers against the real gateway in their staging mode. Save mocking for the unit and integration suites that run in CI; keep the pre-release smoke test clean of any test doubles so it actually answers the question it is supposed to answer.

The bottom line: Mock to test how your code behaves under conditions you cannot produce on demand — not as a blanket substitute for real API calls. A mock that only returns 200 is not test coverage; it is a politely disguised skip.

Best practices and anti-patterns

Don't over-mock. Mock external services (third-party APIs, payment processors). Don't mock your own code. If you're mocking database calls, something is wrong with your architecture.

Keep mocks close to reality. If the real API returns 5xx errors 1% of the time, don't mock it returning 500 every time. Mocks should reflect realistic scenarios.
Use contract testing alongside mocks. Mocks can become stale if the real API changes. Use tools like Pact (contract testing) to verify your mock matches the real API schema.
Test error handling explicitly. Don't just test the happy path. Have dedicated tests for 400, 401, 403, 500, and timeout scenarios.
Reset state between tests. If your mock is stateful, reset it before each test. Otherwise, test order affects results (brittle).
Document mock setup in test comments. When a test uses unusual mocking (e.g. "this call returns 500 on second attempt"), leave a comment explaining why.
Run integration tests against real APIs periodically. Mocks are great for speed, but periodically (nightly, before release) run a test suite against the real API to catch changes.

4 Industry Reality

🏭 What you actually encounter on the job

The mock library version your team pinned two years ago is now incompatible with the upgraded test runner. You spend half a day on a version conflict before writing a single test scenario.
The third-party payment gateway (Windcave, Stripe, Fiserv) changed an undocumented response field three months ago. Your mocks still return the old shape, so all your mock-based tests pass while production subtly breaks. You only find this during a manual regression before go-live.
Legacy codebases at NZ banks and telcos often have service calls scattered directly through business logic with no dependency-injection seam. You cannot swap in a mock without refactoring the code first — and refactoring requires sign-off. Senior testers document this as a coverage gap, flag it for the next sprint, and work around it with targeted integration tests on a shared dev environment.
Teams building against the New Zealand government APIs (RealMe, Revenue NZ, CoverNZ) rarely have sandbox environments with realistic failure modes. Mocking these is not optional — it is the only way to test sad paths. The mock schema is usually hand-crafted from reading API docs because no contract file is published.
On fast-moving squads, mocks are often written by the developer alongside the feature code. By the time the tester arrives, the mock already exists — and may have been written optimistically (always returns 200). Part of the senior tester’s job is auditing those mocks and adding the error stubs the developer did not think to include.

5 When to Use It — and When Not To

⚡ Decision guide

✓ Use it when

You need to trigger error conditions (500s, timeouts, malformed JSON) that you cannot reliably produce from a real external service on demand.
The real API has rate limits, costs money per call, or modifies production data — payment gateways, email services, SMS providers, and NZ government APIs all fall here.
The downstream service is not yet built or is unreliable in the shared dev environment, blocking parallel development.
You are writing unit or integration tests that must run in milliseconds in CI/CD — real API round-trips destroy pipeline speed.
You want deterministic, repeatable tests regardless of whether a third-party service (CloudBooks, RealMe, NZX feed) is up at the moment the pipeline runs.

✗ Skip it when

You are tempted to mock your own database, your own repositories, or your own business logic — that means testing the mock, not the code. Use a real test database or a container instead.
The service has a fully-featured sandbox or test environment with realistic data and failure modes — use the real thing and save yourself the maintenance burden of keeping a mock in sync.
You are writing end-to-end smoke tests against production before or after a release — use the real API so you prove the live integration actually works.
The behaviour you need to test is inside the third-party service itself (e.g. “does the Windcave gateway correctly decline expired cards?”) — a mock cannot test the service, only how your code reacts to responses.
The mock complexity is outpacing the feature complexity. If you are writing 300-line stateful mock setup code to test a simple 20-line function, something is wrong with the test boundary.

6 Best Practices

✓ What experienced testers do

✓ Mock at the HTTP boundary, not inside your code. Use a real HTTP mock server (WireMock, Mountebank) rather than patching language-level objects. This catches real serialisation bugs that in-process mocks miss.
✓ Pair every mock with a contract test. Use Pact or a schema-validation step so you know immediately when the real API drifts from what your mock returns. Without this, mocks become traps.
✓ Always stub error paths — at minimum 400, 401, 403, 500, and timeout. Happy-path-only mocks are the single most common reason mock-covered code still fails in production.
✓ Reset all stubs and scenario state in beforeEach, not afterEach. Resetting before (not after) guarantees a clean start even if a previous test crashes before its teardown runs.
✓ Use verify() as well as stub. Assert that your code called the dependency with the correct URL, method, headers, and body. A test that only checks output — and never verifies the request was made — can pass when the API call was skipped entirely.
✓ Keep mock responses as close to real API responses as possible. Copy real response payloads from API documentation or recorded traffic. Hand-crafting minimal fake responses hides real-world field-name typos and missing required fields.
✓ Run a periodic suite against the real service. Nightly or before every release, run a subset of tests against the actual external API to catch drift. Mocks that have not been validated against the live service in six months are unreliable.
✓ Document unusual mock setups with a comment. When a test uses a stateful scenario or a delay, leave a one-line comment explaining why. Future maintainers should not need to reverse-engineer the intent.
✓ Scope mock servers to the test suite, not individual tests. Starting and stopping WireMock for each test is slow. Start it once per test class or per CI job, then reset state between tests.
✓ When in doubt about what a real API does, record it first. Use WireMock’s record/playback mode or a proxy like mitmproxy to capture real traffic, then edit the recorded responses to add edge cases. Starting from real data is always faster than guessing.

7 Common Misconceptions

❌ Myth: If all my mock-based tests pass, my integration with the real API is working correctly.

Reality: A mock only proves your code handles the responses you defined. It tells you nothing about whether the real API still returns those responses. The real Windcave or CloudBooks API could have changed a field name, added a required header, or deprecated an endpoint — and every mock test would still be green. Mocks need to be validated against real API contracts (via Pact or periodic integration runs) or they create a false sense of security.

❌ Myth: Mocking is only needed when the real API is unavailable.

Reality: Availability is the least important reason to mock. The main reasons are: (1) you cannot make a real API reliably return a 500, a timeout, or a malformed payload on cue; (2) real API calls are too slow for a fast CI pipeline; (3) real API calls can have side effects (charging a card, sending an SMS, writing to a shared database). Even when the real API is fully available and working, mocking is the correct approach for most test scenarios.

❌ Myth: Mocking your own database is fine — it’s just another dependency.

Reality: Mocking your own database is a classic anti-pattern. Your database layer is your code, not an external service. If you stub it, you are no longer testing how your queries behave against a real schema, how your ORM maps objects, or whether your transactions actually commit. You are testing a fiction. Use a real test database, an in-memory database (H2, SQLite), or a Testcontainers PostgreSQL instance instead.

8 Now You Try

Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot: mock it or not?

For each dependency in a KiwiSaver provider’s checkout flow, decide whether you should mock it or use the real thing in your test, and give one sentence of reasoning. (a) the third-party card-payment gateway; (b) your own balance-calculation function; (c) the RealMe identity-verification service; (d) your application’s own PostgreSQL database; (e) an external NZX share-price feed.

Show model answer

(a) Card-payment gateway — MOCK. It is a third-party service you do not control; you cannot safely make it return failures or charge real cards in tests.
(b) Own balance calculation — REAL. It is your own code and the thing under test. Mocking it would mean testing the mock, not the logic.
(c) RealMe verification — MOCK. External government service you cannot drive into timeouts or 500s on demand; mock it to test slow/down/denied paths.
(d) Own database — REAL (use a real test database or an in-memory/container instance). Mocking your own DB is the classic anti-pattern — if you are stubbing your own data layer, the architecture or the test boundary is wrong.
(e) NZX price feed — MOCK. Third-party, rate-limited, and its data changes constantly, so a real call makes the test non-deterministic.

The rule of thumb: mock external services you do not control; do NOT mock your own code or your own database.

🔧 Exercise 2 of 3 — Fix: repair a brittle mock setup

A tester set up the mock below for an Harbour Bank payments integration. It is brittle and gives false confidence. Describe what is wrong and how you would fix it. Look for: state not reset, only the happy path mocked, an unrealistic mock, and a missing verification.

Flawed setup:
• One stub: POST /charge always returns 200 “approved”.
• No reset() between tests — stubs accumulate across the suite.
• No 400/401/500/timeout stubs at all.
• The test asserts the order status, but never verifies the charge endpoint was actually called.

What is wrong and how to fix it:

Show model answer

Problem 1 (state): no reset() between tests means stubs and scenario state leak across tests, so results depend on test order — brittle and non-deterministic. Fix: reset the mock (and reset scenarios) in a beforeEach.
Problem 2 (only happy path): only 200 "approved" is mocked, so the error-handling code is never exercised. Fix: add dedicated stubs for 400 (bad input), 401 (auth failure), 500 (server error) and a timeout/delay, and assert the code handles each gracefully.
Problem 3 (unrealistic): "always 200" does not reflect a real gateway that occasionally fails. Keep mocks close to reality — at minimum cover the realistic failure modes the gateway actually produces.
Problem 4 (no verification): asserting the order status alone does not prove the charge endpoint was called with the right body. Fix: verify the request was made (correct URL, method, amount) and, for retry logic, verify the call count.
Corrected approach: reset in beforeEach; stub happy path + 400/401/500 + timeout; assert behaviour for each; verify the charge call and its body; consider contract testing so the mock stays in sync with the real API.

🏗️ Exercise 3 of 3 — Build: design a stateful mock

A TransitNZ licence-renewal app polls a background job: POST /renewals/{id}/status returns processing for the first two calls, then complete. Design a stateful (sequential) mock for this, and list the test assertions that prove your polling code behaves correctly — including what should happen if it never reaches complete.

Show model answer

Mock design (sequential / scenario states):
- Call 1 returns {"status":"processing"} and sets state to "second".
- Call 2 returns {"status":"processing"} and sets state to "third".
- Call 3 returns {"status":"complete"}.
(In WireMock this is inScenario(...).whenScenarioStateIs(...).willSetStateTo(...); other tools call it sequential responses.)

Assertions on the polling code:
- It polls until it sees "complete" and then stops (verify the endpoint was called exactly 3 times — no extra calls after completion).
- It surfaces the final "complete" result to the caller.
- It waits/backs off between polls rather than hammering the endpoint.

Reset / cleanup: reset the mock and the scenario state in beforeEach so the next test starts at call 1.

Negative case (never completes): add a stub (or scenario) that keeps returning "processing", and assert the polling code gives up after a max number of attempts or a timeout, returns a clear error, and does not loop forever.

Why stateful: a single fixed response cannot model "processing then complete" — only a mock that remembers how many times it has been called can exercise real polling logic.

Why teams fail here

Writing a single happy-path stub and calling it “covered” — no 400, no 401, no 500, no timeout stub means the error-handling code has never been tested once.
Forgetting to pair mocks with contract tests or periodic real-API runs — the mock passes for six months while the real provider quietly changes a field, and nobody finds out until a production go-live.
Not resetting state between tests — stubs and scenario state accumulate across the suite, tests pass only in the right order, and the bug surfaces only when someone runs the suite in isolation.
Mocking the wrong boundary — stubbing internal service classes or the database layer instead of the HTTP boundary, which hides real serialisation bugs and couples tests tightly to internal implementation.

Key takeaway

A mock that only returns success is not a test — it is a rubber stamp: stub the failures, verify the calls, and pair it with a contract test, or you are just proving your code handles the inputs you already assumed it would receive.

How this has changed

The field moved. Here is how API Mocking and Stubbing evolved from its origins to current practice.

Pre-2010

Test doubles are a manual effort — developers hand-write fake implementations, maintenance is painful, and they diverge from real APIs quickly. Most teams avoid mocking entirely and test only against live or staging environments.

2011

WireMock released. Teams can record real HTTP interactions and replay them as stubs. The first tool that makes API mocking practical for large test suites without writing fake implementations from scratch.

2015

Pact consumer-driven contract testing emerges, shifting the conversation from "mock the API" to "verify the contract". Service virtualisation tools (Mountebank, Hoverfly) enable realistic simulation of entire downstream systems including latency and fault injection.

2019

OpenAPI/Swagger-driven mock generation matures — Prism, Stoplight, and Postman can generate mocks directly from the spec. Contract-first development means mocks are often ready before any backend code exists.

Now

AI tools can generate realistic synthetic responses from API schemas and generate stub data that satisfies complex business rules. The challenge has shifted from "how to create a mock" to "how to keep mocks honest when the real API changes" — contract testing solves this.

Self-Check

Click each question to reveal the answer.

Interview Questions

What NZ hiring managers ask about API Mocking and Stubbing — and what strong answers look like.

What is the difference between a mock and a stub, and when would you use each?

Strong answer: A stub provides canned responses to calls — it returns predetermined data without caring how many times it's called or in what order. A mock is a stub with expectations — it verifies that specific calls were made with specific arguments. Use a stub when you need to isolate the system under test from a dependency and the test only cares about the response. Use a mock when the test needs to verify that the system correctly called the dependency — for example, that a payment service was called exactly once with the right amount.

Junior/Mid

When should you NOT mock an API dependency, and what are the risks of over-mocking?

Strong answer: Do not mock when the interaction with the real API is what you are testing — authentication handshakes, rate limit behaviour, and network error handling are all poorly served by mocks. The risk of over-mocking is mock drift: the stub returns yesterday's response shape while the real API has changed its contract. Teams discover the drift in production, not in tests. Consumer-driven contract testing (Pact) solves this by verifying that the mock stays consistent with the real provider.

Mid/Senior

How does WireMock differ from Pact, and when would you choose one over the other?

Strong answer: WireMock is a service virtualisation tool — you define request/response mappings and it acts as a standalone HTTP stub server. Good for simulating a dependency you don't control (third-party API, legacy system) or for reproducing specific error scenarios. Pact is a contract testing framework — consumers define what they need, providers verify they meet it. Good for microservices you own where you need to keep the contract honest as both sides evolve. Use WireMock to isolate a test from an external dependency; use Pact to verify internal service contracts across teams.

Mid/Senior

Q1: What is the single biggest reason to mock an external API rather than call the real one in a test?

Control. You cannot reliably make a real external service return a 500, time out, or send malformed JSON on demand. A mock lets you stage those failure modes repeatably so your error-handling code can actually be tested.

Q2: What is the difference between a stub and a mock?

A stub returns canned responses to isolate the code under test from a dependency. A mock does that and also records how it was called, so you can verify the calling code interacted with it correctly (right URL, body, call count). In everyday speech the terms are used interchangeably.

Q3: Why is mocking your own database an anti-pattern?

You should mock external services you do not control, not your own code. If you are stubbing your own data layer, you are testing the stub instead of real behaviour — and it usually signals the test boundary or the architecture is wrong. Use a real test database or a container/in-memory instance instead.

Q4: A mock makes tests fast but can give false confidence. How do you guard against a stale mock?

Use contract testing (e.g. Pact) so the mock’s shape is verified against the real API’s schema, and periodically — nightly or before release — run a suite against the real service. A mock that has drifted from the real API will pass while production fails.

Q5: Why must a stateful mock be reset between tests?

If state and stubs carry over, the result of one test depends on the ones before it, making the suite order-dependent and brittle. Resetting the mock and its scenario state in a beforeEach guarantees every test starts from a known clean position.

Q6: Your team is building the Benefits NZ Benefits Online portal. A key flow calls the Revenue NZ API to verify a client’s income before calculating entitlement. The Revenue NZ sandbox exists but only ever returns a success response. Which approach do you take for your integration tests, and why?

A: Use a mock for the Revenue NZ API rather than the sandbox. The sandbox cannot return 401 (invalid token), 503 (scheduled maintenance), or a malformed payload on demand — and Revenue NZ APIs do go down for maintenance windows. Set up stubs for the success path, for token expiry (401 so you can verify the portal re-authenticates), for a 503 (so you can verify graceful degradation and a user-friendly message), and for a timeout. Run periodically against the real sandbox to confirm the happy-path response shape has not drifted. This gives you full coverage of sad paths that the sandbox can never deliver.

Q7: What is the key difference between API mocking/stubbing and contract testing, and when should you use both together?

A: A mock is a test double you control — it returns whatever responses you define, with no guarantee those responses still match what the real API returns. Contract testing (e.g. Pact) formally verifies that the response shape your mock returns is still consistent with what the real provider publishes. Use both together: mocks give you speed, isolation, and error-path coverage in your CI pipeline; contract tests alert you when the real API drifts from your mocks. Without contract testing, passing mocks can coexist with a broken production integration for weeks.

Q8: When should you NOT use API mocking, even if you technically could?

A: Skip mocking when: (1) the service has a fully-featured sandbox with realistic failure modes — use the real thing and avoid maintaining a duplicate; (2) you are writing end-to-end smoke tests before or after a production release, where the point is to confirm the live integration works; (3) the behaviour you need to verify is inside the third-party service itself (e.g. whether Windcave correctly declines an expired card — a mock cannot test the gateway, only your code’s reaction to it); and (4) the mock setup is becoming more complex than the code it is testing, which usually signals the wrong test boundary or an architectural seam issue.

Q9: A developer on your team says “We don’t need to set up error stubs — our mocks already pass, so the integration must be working.” What is wrong with this reasoning, and how do you respond?

A: Passing mocks only prove the code handles the responses you explicitly defined — they say nothing about whether the real API still returns those responses, or how the code behaves when the API fails. If no error stubs exist, the error-handling paths have never been exercised at all. In NZ contexts (Revenue NZ, RealMe, TransitNZ) this is especially risky because government APIs do go into maintenance windows and do return 503s under load. The response: add dedicated stubs for 400, 401, 500, and timeout, and assert the app handles each gracefully. Then pair with a contract test or a periodic integration run so you know the mock shapes stay in sync with the real service.

Related techniques: Idempotency Testing, API Testing, Contract Testing.

Continue Learning

Prerequisites

Related Techniques

What to Learn Next

Also in Bootcamp

← Accessibility Automation Next: Idempotency Testing →