DevOps QA · Lesson 1

Container & Kubernetes Testing

“It works on my machine” was supposed to die with containers. It did not — it moved. This lesson teaches you to test the container and the Kubernetes deployment that runs it, so the thing you tested is the thing that ships.

DevOps QA DevOps QA — Lesson 1 of 3 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ logistics firm, Kahu Freight, ran a parcel-tracking API. The team had moved to containers and was proud of it — every build produced a Docker image, the image passed its tests in the pipeline, and the same image was promoted from test to staging to production. No more “works on my machine.” Then one Tuesday the production rollout failed, and the new version never came up.

The image was fine. The tests had run inside the container and passed. What broke was everything around the container. The new version expected a configuration value — a database connection string — to arrive as an environment variable. In the test environment that variable was set. In production it was supplied by a Kubernetes Secret that had been renamed during a security tidy-up the week before. The container started, could not find its database, and crashed. Kubernetes did exactly what it was told: it restarted the crashing container, over and over, while the old version was already being torn down.

For eleven minutes parcel tracking returned errors across the motu. The fix took thirty seconds once someone read the right log line. The point is sharper than the outage: the team had tested the application thoroughly and tested the image thoroughly, and still shipped a broken release — because nobody had tested how the container met its environment. The Dockerfile, the readiness probe, the config, the secret, and the deployment manifest are all things under test. Treating the application code as the only thing worth testing is how a green pipeline ships a red production.

This lesson teaches you to test the whole unit of deployment — the image, its probes, its config and secrets, and the Kubernetes objects that schedule it — not just the code inside.

2 The Rule

In a containerised system the unit of deployment is the image plus its runtime contract — its config, secrets, probes, and Kubernetes manifest — not the application code alone. The code can be perfect and the release still fail, because the container is only as good as the environment it meets. Test the container as it will actually run: same image, real probes, real config wiring, on a cluster that behaves like production.

3 The Analogy

Analogy

A shipping container arriving at the Ports of Auckland.

The whole promise of a shipping container is that what you sealed in Shanghai is exactly what comes out in Auckland — the box does not change in transit. A Docker image makes the same promise about software. But the container still has to fit the crane, match the manifest, clear customs, and connect to the truck that collects it. A perfectly packed container that does not match its paperwork sits on the wharf going nowhere. The goods inside are fine; the problem is the interface to the port.

Testing a container the same way: checking the goods inside is necessary but not enough. You also test that it declares its contents honestly (the image), announces when it is ready to be unloaded (readiness probe), carries the right paperwork (config and secrets), and matches the berth it is assigned to (the Kubernetes deployment). Kahu Freight tested the goods and forgot the paperwork.

4 Testing the Docker Image

The image is the artefact that ships, so it is the first thing under test — before it ever reaches a cluster. Image testing asks: is this image correct, lean, and safe? It runs in the pipeline, fast, on every build.

What the image contains

The question: does the image hold exactly what it should, and nothing it should not? You verify the right application version is in the image, that expected files and binaries are present, and — just as important — that secrets, build tooling, and source code that have no business in a runtime image are absent. A surprising number of production images ship with a private key baked in because someone copied a whole directory in the Dockerfile.

That the image runs as built

The question: when you start this exact image, does the application come up? A smoke test that starts the container and hits its health endpoint catches a class of failure unit tests never will — a missing runtime dependency, a wrong base image, a broken entrypoint. This is the test that turns “the code compiles” into “the artefact starts.”

That the image is safe and lean

The question: is the image free of known vulnerabilities and unnecessary weight? A container image scan checks the base image and installed packages against known CVEs — relevant for any NZ government system aligning to the NZISM. Image size matters too: a smaller image pulls faster, which directly affects how quickly a Kubernetes rollout or rollback completes.

Pro tip: Test the exact image that will be promoted — same digest — through every stage. If the pipeline rebuilds the image between test and production, you tested a different artefact from the one that ships, and the whole containerisation guarantee is gone. Promote by image digest, not by rebuilding from the same tag.

5 Health & Readiness Probes

Kubernetes does not watch your application the way a person would. It only knows what your probes tell it. Two probes matter most, they are not the same thing, and confusing them is one of the most common container defects there is.

Liveness probe — “is this still alive?”

A liveness probe answers whether the container should be restarted. If it fails, Kubernetes kills and restarts the container. The test risk is a liveness probe that is too sensitive — it fails during a slow but recoverable moment (a long garbage-collection pause, a brief dependency blip), Kubernetes restarts a container that would have recovered, and you get a restart loop that turns a hiccup into an outage.

Readiness probe — “can this take traffic yet?”

A readiness probe answers whether the container should receive requests. If it fails, Kubernetes stops sending traffic to that pod but does not restart it. This is the Kahu Freight gap: a readiness probe must check the things the app actually needs to serve a request — can it reach its database, is its cache warm — not just “is the web server listening.” A readiness probe that returns 200 before the app can reach its database will send live traffic into a pod that immediately errors.

Liveness fails — Kubernetes restarts the pod. Use for “stuck and unrecoverable”.
Readiness fails — Kubernetes removes the pod from the load balancer until it passes. Use for “not ready for traffic yet”.
Startup probe — holds off the other two until a slow-starting app has booted, so a long startup is not mistaken for a failure.

As a tester you verify three behaviours: a healthy pod passes both probes; a pod that has lost its database fails readiness (drops out of traffic) but not necessarily liveness (does not pointlessly restart); and a genuinely stuck pod fails liveness and is restarted. If readiness and liveness are wired to the same shallow endpoint, none of that works — and that is the bug to look for first.

6 Config & Secret Testing

A container image is built once and run in many environments — test, staging, production. What changes between them is configuration. In Kubernetes that config arrives as ConfigMaps (non-sensitive values) and Secrets (sensitive values like passwords and connection strings), usually surfaced to the container as environment variables or mounted files. The Kahu Freight outage was, at heart, a config-wiring defect, and config wiring is badly under-tested on most teams.

  • Required config is present and correct per environment: for an IRD-facing service, does the production deployment actually receive the production database string and API endpoint, not a value copied from staging? Test the wiring in the target environment, not just that the app reads a variable.
  • Missing config fails loudly, not silently: if a required value is absent, the container should fail fast with a clear error — not start up and quietly fall back to a default that points at the wrong system.
  • Secrets are injected, never baked in: verify the Secret is mounted at runtime and the value is correct — and that the same secret does not appear in the image, in logs, or in the deployment manifest checked into git.
  • Renames and rotations do not break the contract: the exact Kahu Freight failure. When a Secret or ConfigMap key is renamed or rotated, a test should catch that the deployment still references the old name before it reaches production.
Pro tip: The single highest-value config test is to deploy the real image into an environment that mirrors production’s config wiring and confirm the pod reaches ready — not just running. “Running” means the process started; “ready” means it found its config, reached its dependencies, and can serve. Most config defects live in the gap between the two.

7 Validating the Kubernetes Deployment

The deployment manifest is the YAML that tells Kubernetes how to run your container — how many replicas, how much CPU and memory, which probes, which config, and how to roll out a new version. It is code, it ships with the release, and it is under test like any other.

Here is a deployment-validation test case for the Kahu Freight tracking service:

Test ID: K8S-DEP-031
Risk category: Deployment — rollout safety
Test type: Kubernetes deployment validation
Description: Verify a new version of the tracking-api deployment rolls out with
                  zero dropped requests, and that a failed new pod does not take down the
                  running version.
Acceptance criteria: During a rolling update, readiness gating keeps at least the
                  configured minimum of ready replicas serving; a new pod that never
                  becomes ready halts the rollout, leaving the old version serving 100%.
Evidence required: Request-success log across the rollout window; rollout status
                  output showing the halt; the image digest under test.
Traceability: Risk R-03 (bad rollout causes tracking outage) in DevOps risk register.
Result: [Pass / Fail] — dropped requests and final replica state listed.

Notice the shape: the acceptance criterion is about behaviour during change, not just steady state; the evidence is reproducible and names the exact image digest; and it traces to a numbered risk. Other deployment properties worth a test: resource requests and limits are set (so one pod cannot starve the node), the rollout strategy is correct, and an explicit rollback returns the previous version to service cleanly.

8 Ephemeral Test Environments

The old model was a handful of shared, long-lived environments — one “test” box everybody fought over, slowly drifting away from production. Containers and Kubernetes enable a better pattern: spin up a fresh, production-like environment on demand for a single branch or pull request, run the tests against it, then tear it down. Each one is identical, isolated, and disposable.

  • Parity with production: an ephemeral environment is only useful if it runs the same image, the same manifests, and the same kind of config wiring as production. The closer the parity, the more real the test — and the more likely it would have caught the Kahu Freight secret rename.
  • Isolation: because each environment is fresh and separate, two branches cannot corrupt each other’s data, and a destructive test cannot poison a shared box for everyone else.
  • Clean teardown: the environment, and any data it created, must be fully removed afterwards — a half-deleted environment leaks cost and, for a Privacy Act 2020 system, can leave test data containing personal information lying around.
  • Seeded, representative data: the environment needs realistic test data loaded automatically on creation, so the test is meaningful and repeatable rather than running against an empty database.

For a tester this changes the job: instead of guarding one fragile shared environment, you define what a correct environment is — image, config, data, teardown — and that definition becomes a tested, repeatable artefact. This is also the foundation the next two lessons build on: you cannot safely test feature flags or canary releases without trustworthy, production-like environments to do it in.

9 Common Mistakes

🚫 Testing the application but never the container as deployed

Why it happens: Unit and integration tests run against the code, pass, and feel like enough.
The fix: The code passing tells you nothing about the image, its probes, its config wiring, or the manifest — the Kahu Freight failure lived entirely there. Smoke-test the exact image in a production-like environment and confirm the pod reaches ready, not just running.

🚫 Wiring liveness and readiness to the same shallow endpoint

Why it happens: One “/health” endpoint that returns 200 is the quickest thing to build.
The fix: They answer different questions. Readiness must check real dependencies (can it reach the database?) so traffic only flows to pods that can serve; liveness should stay shallow so a recoverable blip does not trigger a pointless restart loop. Test all three behaviours separately.

🚫 Rebuilding the image between test and production

Why it happens: The pipeline rebuilds from the same Dockerfile and tag at each stage, which looks equivalent.
The fix: A rebuild can pull a newer base layer or dependency, so the production image differs from the one you tested — the containerisation guarantee is broken. Build once, promote the same digest through every stage.

🚫 Treating an ephemeral environment as “close enough” when it drifts from production

Why it happens: A simpler, cheaper test environment is faster to stand up and seems good enough.
The fix: Every difference from production — a missing secret, a different config path, a lighter database — is a defect the environment cannot catch. Define parity explicitly and test against it; an environment that does not mirror the production runtime contract gives false confidence.

10 Now You Try

Three graded exercises: spot the risks, fix a broken probe setup, then build a deployment test plan. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Container Risks

Read the description of a containerised Waka Kotahi vehicle-licensing service below. Identify 3 container/Kubernetes risks that could cause a release to fail or behave unsafely in production, and name what you would test for each.

Service: vehicle-licensing API
The CI pipeline builds a Docker image, runs unit tests inside it, and on success deploys to test. To save time the production stage rebuilds the image from the same Dockerfile and latest tag. The deployment has a single “/health” endpoint wired to both the liveness and readiness probes; it returns 200 as soon as the web server starts. The database connection string is read from an environment variable; if it is missing the app starts anyway and uses a built-in default pointing at a developer database. The image is built by copying the whole project directory in, including the .env file used in development.

List 3 risks and what you would test for each:

Show model answer
There are at least four real risks; any three well-explained earns full marks.

1. Image not promoted by digest — production rebuilds from the same Dockerfile and the "latest" tag, so the production image can differ from the one tested (newer base layer or dependency). Test: promote one built image by digest through every stage and verify the digest is identical in production.

2. Liveness and readiness share one shallow endpoint — "/health" returns 200 once the web server starts, before the database is reachable. Readiness will send live traffic to a pod that cannot serve, and a recoverable blip can trigger restart loops. Test: readiness fails when the database is unreachable (pod drops out of traffic); liveness stays shallow so a brief blip does not restart the pod.

3. Missing config fails silently — if the DB connection string is absent the app falls back to a developer database instead of failing fast. Test: with the variable unset, the pod must fail to become ready with a clear error, not start against a default.

Bonus risk: Secrets baked into the image — copying the whole directory including .env ships dev secrets in the image. Test: scan the image and assert no .env / secrets / source-only files are present, and that real secrets are injected at runtime.

The trap: every one of these passes the in-container unit tests. None of them is about the application code — they are about how the container meets its environment, which is exactly what a code-only tester never checks.
🔧 Exercise 2 of 3 — Fix the Probe Setup

The probe configuration below is broken in the way that causes restart loops and traffic to dead pods. Describe a corrected probe design for a fictional Te Whatu Ora appointment-booking service that depends on a database and a downstream notification API. Specify, for liveness, readiness, and startup: what each should check, and what Kubernetes does when each fails.

Broken setup:
“Liveness and readiness both call /health, which checks the database AND the notification API and returns 500 if either is down. There is no startup probe; the app takes ~40 seconds to boot and liveness starts immediately.”

Describe the corrected probe design:

Show model answer
Liveness — checks: only that the process itself is alive and not deadlocked (a shallow internal check, e.g. /livez that returns 200 if the event loop / request handler is responsive). On failure Kubernetes restarts the pod. It must NOT depend on the database or notification API.

Readiness — checks: that this pod can serve a request right now — the database is reachable. On failure Kubernetes removes the pod from the load balancer (stops sending traffic) but does not restart it. The pod rejoins automatically when readiness passes again.

Startup — checks: that the ~40-second boot has completed; it gates liveness and readiness until the app has started. On failure (still booting) Kubernetes keeps waiting up to the configured budget; only if startup never succeeds is the pod restarted. This stops a slow boot being mistaken for a crash.

A subtle point: the notification API is a downstream, non-critical dependency. Failing readiness when it is down would pull the whole service out of traffic over a secondary feature — usually wrong. Treat it as degraded-mode, not unready. Readiness should gate on what the pod truly needs to serve its core request (the database), not on every dependency.

Why the original was broken: (1) liveness depended on the database and notification API, so a brief DB blip or a notification outage restarted healthy pods — a restart loop and an amplified outage. (2) No startup probe plus immediate liveness meant the 40-second boot looked like a failing pod, so Kubernetes killed it before it ever started. (3) Coupling readiness to the notification API pulled the service out of traffic for a non-critical dependency.
🏗️ Exercise 3 of 3 — Build a Deployment Test Plan

Design a Kubernetes deployment test plan of 3 test cases for a fictional ANZ statement-generation service being rolled out with a new version. Each test case should have at least: an ID, what it verifies, an acceptance criterion, and the evidence required. Cover a clean rolling update, a failed-pod rollout, and config/secret wiring.

Show model answer
K8S-01 | Verifies: a new version rolls out with zero dropped requests | Acceptance criteria: during the rolling update, success rate of in-flight requests stays at 100% and the configured minimum ready replicas keep serving throughout | Evidence required: request-success log across the rollout window; rollout status output; image digest under test

K8S-02 | Verifies: a new pod that never becomes ready does not take down the running version | Acceptance criteria: a new pod failing its readiness probe halts the rollout; the previous version keeps serving 100% of traffic; no new pod receives traffic until ready | Evidence required: rollout-status output showing the halt; traffic/error logs proving old version served throughout; the deliberately-broken image digest

K8S-03 | Verifies: production config and secrets are wired correctly so the pod reaches ready | Acceptance criteria: deployed with the production ConfigMap and Secret, the pod reaches Ready (not just Running); a deliberately missing required secret causes a fast, clear failure rather than a silent default | Evidence required: pod status showing Ready; logs for the missing-secret negative case; confirmation the secret is injected at runtime and absent from the image

Strong plans: each case targets behaviour during change (not just steady state), has a measurable criterion, names concrete reproducible evidence including the image digest, and together they cover a clean rollout, a failed rollout, and config/secret wiring. Weak plans test only "the new version is up" — which would have passed at Kahu Freight right before the outage.

11 Self-Check

Click each question to reveal the answer.

Q1: Why is testing the application code not enough for a containerised release?

Because the unit of deployment is the image plus its runtime contract — config, secrets, probes, and the Kubernetes manifest. The code can pass every test and the release still fail in the wiring around it, which is exactly what happened at Kahu Freight. You must test the container as it will actually run, confirming the pod reaches ready, not just running.

Q2: What is the difference between a liveness probe and a readiness probe?

Liveness answers “should this container be restarted?” — on failure Kubernetes restarts the pod. Readiness answers “can this pod take traffic yet?” — on failure Kubernetes removes it from the load balancer but does not restart it. Readiness should check real dependencies; liveness should stay shallow so a recoverable blip does not cause a restart loop.

Q3: Why must you promote the same image digest through every stage rather than rebuilding?

A rebuild can pull a newer base layer or dependency, so the image in production differs from the one you tested — which breaks the entire containerisation guarantee that “what you tested is what ships.” Build once and promote by digest, so test, staging, and production all run the identical artefact.

Q4: What should happen when a required configuration value is missing, and why test for it?

The container should fail fast with a clear error rather than start up and silently fall back to a default pointing at the wrong system. Silent fallback is how a service in production quietly talks to a developer database. Testing the missing-config case is a negative test that catches exactly this class of defect.

Q5: What makes an ephemeral test environment trustworthy?

Parity with production — the same image, manifests, and config-wiring pattern — plus isolation, clean teardown (including any personal data under the Privacy Act 2020), and seeded representative data. Every difference from production is a defect the environment cannot catch, so the value of the environment is defined by how closely it mirrors the real runtime contract.

12 Interview Prep

Real questions asked in NZ QA interviews for DevOps-adjacent roles. Read the model answers, then practise your own version.

“Our pipeline tests pass on every build, but we still get failed production rollouts. Where would you look?”

At the gap between the code and the container as deployed, because that is where a green pipeline ships a red release. I’d check whether we test the exact image that goes to production — same digest, not a rebuild — and whether we ever deploy it into a production-like environment and confirm the pod reaches ready, not just running. Then the runtime contract: are liveness and readiness wired to different, meaningful checks; does missing config fail fast or silently default; are secrets injected at runtime and not baked in. Most “tests pass but rollout fails” stories live in the probes, the config wiring, or the manifest — none of which the in-container unit tests touch.

“How would you test that a readiness probe is doing its job?”

I’d test three behaviours separately. A healthy pod with its database reachable passes readiness and serves traffic. A pod whose database I’ve made unreachable must fail readiness — so Kubernetes pulls it out of the load balancer and no live traffic hits a pod that would error — and crucially it should not also fail liveness and restart, because the problem may be external and recoverable. And when the dependency comes back, the pod should rejoin traffic automatically once readiness passes again. If readiness is wired to a shallow “web server is up” check, the first of those fails and live traffic flows into a pod that cannot serve — which is the bug I’m really hunting.

“What is the value of ephemeral environments to a tester, and what is the catch?”

The value is a fresh, isolated, production-like environment per branch or pull request — no fighting over one shared box that has drifted from production, no two changes corrupting each other’s data, and a clean teardown after. It moves the tester’s job towards defining what a correct environment is — image, config, seeded data, teardown — as a repeatable artefact. The catch is parity: an ephemeral environment is only as good as how closely it mirrors production’s runtime contract. Every shortcut — a missing secret, a lighter database, a different config path — is a defect it can no longer catch, so it can hand you false confidence. And for a Privacy Act 2020 system I’d make sure teardown actually removes any test data containing personal information.