Network & Resilience · Lesson 3

State & Caching Bugs

Q: Name the four state and caching bug classes.

The expired-token race (a token expires mid-session and the app mishandles the failure), the stale service worker (an old cached build keeps running after a fix ships), localStorage corruption (the app trusts partial or old-schema stored data), and stale cache / failed invalidation (out-of-date cached data is shown as current).

“It works on my machine” — and then a colleague refreshes and it works for them too, so it gets closed as no-repro. That bug is real, and it is hiding in state and caches. This lesson teaches you to find it on purpose.

Network & Resilience Network & Resilience — Lesson 3 of 3 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ council, Tasman Ranges District Council, ran a rates and consents portal as a progressive web app. A resident reported that after their session had been open a while, clicking “Submit consent application” did nothing visible — the button greyed, then the form came back blank with no error and no saved draft. The test team could not reproduce it. They opened the portal, filled the form, submitted, and it worked every time. Closed: no repro.

The resident was right and the testers were unlucky. The portal used a signed token (a JWT) to authenticate each request, and that token expired after thirty minutes. A fresh tester, opening the portal and submitting within a few minutes, never hit the expiry. The resident, who filled in a long application over the better part of an hour, did. When they finally hit submit, the token was expired; the request failed authentication; and because the app had no handling for “token expired mid-session,” it discarded the unsaved form and silently returned a blank page. The bug only appeared as a function of time and state — never on a quick fresh attempt.

It got worse. After the council shipped a fix, some users still saw the old broken behaviour for days. Their browsers had cached the old version of the app in a service worker, and nothing forced an update, so they kept running the buggy build long after it was patched. Now there were two stale-state bugs stacked on top of each other: an expired token discarding work, and a stale cached app refusing to take the fix.

Here is the lesson hidden in that story. The team tested the form on a fresh session with a fresh build, which is the one state in which both bugs are invisible. State and caching bugs do not live in the steady fresh state — they live in the stale, aged, in-between states that real users reach and testers rarely set up. Finding them means deliberately ageing the state, not refreshing it away.

2 The Rule

A refresh hides this whole class of bug, which is exactly why you must stop refreshing. State and caching defects live in aged sessions, stale caches, and corrupted stored data — the states a real user drifts into and a tester resets away. To find them, deliberately age and corrupt the state instead of starting clean.

3 The Analogy

Analogy

A parking ticket that quietly expires while you are still shopping.

You pay for an hour of parking, put the ticket on the dash, and head into the shops. The ticket was valid when you walked away — but it has a time limit, and if you are still inside when the hour runs out, it is now worthless even though nothing about it looks different. The bad outcome is not at the start; it is later, as a function of time, when you come back to a windscreen you assumed was fine.

An expired authentication token is that parking ticket. It is valid when the session starts, so a quick test never sees a problem, but it silently runs out while the user takes their time over a long form. And a stale cached app is like coming back to use a ticket machine that was replaced last week while you were not looking — you are still standing at the old one. State and caching testing is checking what happens when the ticket has expired and the machine has changed, not just what happens the moment you arrive.

4 Where State and Caches Live

To test this class of bug you have to know where state and cached data hide in a modern web app. There are several layers, and a bug in any of them can outlive the steady state a tester sets up.

In-session state and auth tokens

The data a session holds while it is open — the form being filled, the user’s logged-in identity, the auth token authorising each request. The token in particular has a lifetime, and that lifetime is the source of the Tasman Ranges bug. Aged in-session state is the most common hiding place for “works on a fresh attempt” defects.

localStorage and stored client data

Data the app writes to the browser to persist across sessions — preferences, a saved draft, a cached identity. It survives a reload, which is its value and its danger: if it is corrupted, partially written, or left over from an older version of the app, every future session inherits the bad data until it is cleared.

The service worker and app-shell cache

A progressive web app caches its own code in a service worker so it can load instantly and work offline. The benefit is speed; the danger is that the cached app can be a stale, old version that keeps running long after a fix ships — the second Tasman Ranges bug. A service worker that does not update correctly serves yesterday’s broken build to today’s user.

HTTP and data caches

Responses cached by the browser or an intermediate layer so they need not be fetched again. The risk is the classic one: cached data that is now wrong — an old rates balance, a stale price, a status that has since changed — shown to the user as if it were current because nobody told the cache it was out of date.

Pro tip: The first move when you cannot reproduce a reported bug is to ask “what state was the user in that I am not?” Long session, old build, leftover stored data, cached response. Reproduce the state, and the bug usually reproduces with it. A clean test environment is precisely the one place these bugs cannot live.

5 The Four Bug Classes

This kind of defect clusters into four recognisable classes. Knowing them turns a vague “no-repro” into a targeted test.

The expired-token race. An auth token expires part-way through a session, and the app handles the resulting failure badly — discarding unsaved work, looping on a failed refresh, or showing a blank screen instead of cleanly re-authenticating and preserving the user’s input. The Tasman Ranges submit-after-expiry is this class. The fix and the test centre on what happens when a request fails because the token has aged out: the user’s work must survive, and they must be re-authenticated, not dumped.

The stale service worker. A cached old build of the app keeps running after a new one ships, so users see bugs that are already fixed or miss features that are already live. The test is whether a deployed update actually reaches a user who already has the old version cached — and how quickly, and whether it can be forced.

localStorage corruption. Stored client data is partially written, malformed, or left over from an older schema, and the app trusts it blindly — reading a half-saved draft, parsing data in an old format, crashing on a value it no longer expects. The test is whether the app validates and recovers from bad stored data rather than assuming it is always well-formed.

Stale cache / failed invalidation. Cached data that is now out of date is shown as current — the old balance, the changed status, the superseded price. The test is whether, when the underlying data changes, the cache is correctly invalidated so the user sees the new value and not the comfortable old one.

6 Cache Invalidation & Versioning

Most of this class comes down to one famously hard problem: knowing when cached data is no longer valid and replacing it. A tester does not have to solve it, but must know the controls that manage it and check they work.

Expiry (time-based). Cached data carries a lifetime, after which it is treated as stale and refetched. Simple, but blunt: too long and users see old data, too short and the cache barely helps. The test is that data does expire and refresh on schedule, and that an expiry chosen for performance has not been set so long that users see dangerously old values — an old rates balance, say.

Versioned assets (cache busting). Static files are given a version in their name or query string, so a new deploy produces new file names the browser has never cached and must fetch fresh. This is the standard defence against the stale-app problem. The test is that a new build genuinely changes the versions and that an old cached asset is not silently served in its place — the exact failure that left Tasman Ranges users on the old build.

Service worker update flow. A new service worker should detect the new build, install in the background, and activate — ideally prompting the user or updating on the next visit, never leaving them stranded on the old version indefinitely. The test is the full update path: deploy a change, load as a user who has the old version cached, and confirm the update is picked up within a defined and acceptable window.

Event-based invalidation. When the underlying data changes, the cache holding it is explicitly cleared or updated, so the next read is fresh. This is the precise control for the stale-balance class. The test is that changing a value at the source causes the cached copy to be invalidated, so the user sees the new value rather than the old one.

Pro tip: The single most revealing state-and-caching test is the “old build, new deploy” check: load the app as a user who already has it cached, ship a visibly different version, and time how long until that user gets the new one — or whether they ever do. It exercises versioning, the service worker update flow, and cache invalidation together, and it directly catches the bug that keeps users on a build you have already fixed.

7 What to Test for State & Caching

The practical checklist for any app that holds state or caches data:

Token expiry mid-session: let the auth token expire during a long task, then act — the user’s unsaved work survives and they are cleanly re-authenticated, never dumped to a blank screen.
The long-session path: deliberately age the session well past any token or cache lifetime before the key action, rather than always testing fresh.
Stale service worker / old build: a deployed update reaches a user who already has the app cached, within a defined window, and can be forced if needed.
Corrupted localStorage: seed malformed, partial, or old-schema stored data and confirm the app validates and recovers rather than trusting it blindly or crashing.
Cache invalidation on change: when source data changes, the cached copy is invalidated and the user sees the new value, not the old one.
Expiry tuned for safety: cache lifetimes are not so long that users see dangerously stale data — balances, statuses, prices.
Cross-tab and multi-session consistency: a change in one tab or device does not leave another showing contradictory stale state.
Clean logout: logging out actually clears stored tokens and sensitive cached data, so the next user of the device does not inherit the last one’s session.

8 Building State & Caching Test Cases

A strong state-and-caching test case sets up the aged or stale state explicitly — it does not start clean — and asserts on what survives and what the user sees. Here is a worked case written to catch the Tasman Ranges expired-token bug:

Test ID:            STC-JWT-009

State condition:    Auth token expired mid-session during a long form fill

Risk category:      Unsaved work discarded silently on token expiry

Pre-conditions:     Logged in; token lifetime known (e.g. 30 min); a long consent form

                  partly completed but not submitted.

Action:             1) Fill the form, then let the session sit until the token has expired.

                  (Or force expiry by ageing/clearing the token.)

                  2) Click Submit.

Expected result:    1) The app detects the expired token and does NOT discard the form.

                  2) The user is cleanly re-authenticated (silent refresh or a re-login prompt).

                  3) The entered data is preserved and the submit then completes.

                  4) At no point is a blank page or silent failure shown.

Server assertion:   The consent application is recorded exactly once with the entered data.

Evidence required:  Token expiry time vs submit time; screen states through the flow;

                  the preserved form data; server record of the submission.

Traceability:       Risk R-08 (session expiry discards unsaved work without warning).

Result:             [Pass / Fail]

Notice what makes this catch the Hook bug: the precondition ages the session past the token lifetime rather than testing fresh; the action is to submit after expiry, which is the one state the original testers never set up; the expected result asserts the form is preserved and the user re-authenticated, not dumped; and there is a server assertion that the application is recorded exactly once. The state condition is named at the top so a reviewer knows the session was deliberately aged.

9 Common Mistakes

🚫 Reproducing on a fresh session and closing the bug as no-repro

Why it happens: A clean test environment always starts fresh, which is the one state where these bugs hide.
The fix: That is the Tasman Ranges trap. When a user reports a bug you cannot reproduce, ask what aged or stale state they were in — long session, expired token, old build, leftover stored data — recreate it, and the bug usually appears. Do not refresh the bug away.

🚫 Never testing what happens when the auth token expires mid-task

Why it happens: Tests are quick, so the token never ages out within them.
The fix: Real users take their time over long forms and hit expiry. Force the token to expire (age or clear it) before the key action and assert the app preserves the user’s work and re-authenticates cleanly — never discards the form into a blank page.

🚫 Assuming a deploy instantly reaches every user

Why it happens: The fix works for the tester, who got the new build, so it is assumed everyone did.
The fix: A cached service worker can keep serving the old build for days — the second Tasman Ranges bug. Test the update path itself: load as a user with the old version cached, deploy a change, and confirm the new build arrives within a defined window. Verify versioning and the service worker update flow actually work.

🚫 Trusting stored client data without validating it

Why it happens: The app wrote the data, so it is assumed to be well-formed and current.
The fix: Stored data can be partial, corrupted, or left over from an old version, and an app that reads it blindly crashes or shows garbage. Seed malformed and old-schema localStorage on purpose and confirm the app validates, recovers, and does not trust it as always correct.

Senior engineer insight

The most painful lesson I learned about caching bugs is that they compound: a stale service worker keeps users on an old build that has a broken token-refresh handler, so the token expires, the refresh fails silently, and the user loses their work — but the bug report says "submit does nothing." Once I understood that caching layers stack, I started treating cache invalidation as a first-class release criterion, not an afterthought. In NZ SaaS products deployed behind Cloudflare, the CDN edge cache adds a third layer on top of the service worker and the browser HTTP cache, and a poorly configured Cache-Control header can serve stale HTML to every NZ PoP for hours after a fix ships.

The most common mistake teams make: they test the happy path on a freshly cleared browser immediately after deploying a fix, declare it resolved, and close the ticket — never checking whether users who already had the old build cached actually received the update.

From the field

A Wellington-based government digital team shipped a long-awaited fix to a rates calculation bug on a council PWA. The team tested, confirmed the new figure, and marked the release done. Three days later, a councillor called to say the old wrong figure was still showing — on his work laptop, which he rarely cleared. The team assumed Cloudflare CDN cache: they purged it, checked again on a clean browser, and it looked fine. But the real culprit was the service worker: it had no version signal in the registered asset manifest, so browsers that had already installed it silently kept serving the old app-shell from cache. The fix only reached users when their browser happened to idle-check the service worker registration — which could take days on an infrequently visited internal tool. The lesson that stuck: a CDN purge fixes the origin edge, but it cannot reach a service worker already installed inside a user's browser. You have to test the update path end-to-end, not just the origin.

10 Now You Try

Three graded exercises across the state and caching bug classes. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Stale-State Bugs

Read the description of a fictional KiwiSaver-style member portal below. Identify 3 state or caching bugs that would only show up in an aged or stale state, and name the bug class for each (expired-token race, stale service worker, localStorage corruption, or stale cache / failed invalidation).

Portal: member balance and contributions PWA
Members log in with a token that lasts 20 minutes; the app makes calls with it but has no handling for an expired token — a failed call just shows a blank panel. The balance shown on the dashboard is cached on first load and never refreshed for the rest of the session, even after the member makes a contribution. The app is a PWA that caches itself in a service worker, with no versioning on its files, so a new deploy may not reach members who already have it. The member’s last-viewed fund is saved to localStorage and read on startup, with no check that the saved value is still a valid fund.

List 3 stale-state bugs and the bug class for each:

Show model answer

There are at least four real bugs here; any three well-explained earns full marks.

1. Expired-token race — The 20-minute token expires mid-session and a failed call just shows a blank panel, so a member who lingers loses the view with no error or re-auth. Bug class: expired-token race. When it shows up: only after the session has been open past 20 minutes — never on a quick fresh login.

2. Stale cache / failed invalidation — The balance is cached on first load and never refreshed, so after a contribution the member still sees the old balance as if it were current. Bug class: stale cache / failed invalidation. When it shows up: after the underlying data changes within the same session.

3. Stale service worker (no versioning) — The PWA caches itself with no versioning, so a new deploy may never reach members who already have the old build. Bug class: stale service worker. When it shows up: after a deploy, for users who already have the app cached.

Bonus — localStorage corruption: the last-viewed fund is read on startup with no validity check, so a removed or renamed fund (or a corrupted value) breaks startup. Bug class: localStorage corruption.

The trap: every one of these is invisible on a fresh login with a fresh build — exactly the state a tester starts in.

🔧 Exercise 2 of 3 — Fix the Test Case

The test case below starts fresh and so cannot catch a stale-state bug. Rewrite it to deliberately age the state, with these fields: Test ID, State condition, Risk category, Pre-conditions, Action, Expected result, Server assertion, Evidence required, Traceability. Use a fictional HealthNZ patient-booking PWA where a clinician fills a long referral as the context.

Original (cannot catch the bug):
“Log in, fill the referral, click submit straight away. Check it submits. Pass if it shows confirmed.”

Rewrite as an aged-state test case:

Show model answer

Test ID: STC-JWT-014

State condition: Auth token expired while the clinician filled a long referral, before submit

Risk category: Referral discarded silently on token expiry mid-task

Pre-conditions: Clinician logged in; token lifetime known; a long referral partly completed but not submitted; the token forced to expire (aged or cleared) before the submit.

Action: 1) Fill the referral. 2) Let the session sit past the token lifetime, or force the token to expire. 3) Click submit.

Expected result: 1) The app detects the expired token and does NOT discard the referral. 2) The clinician is cleanly re-authenticated (silent refresh or re-login prompt). 3) The entered referral data is preserved and the submit then completes. 4) No blank page and no silent failure at any point.

Server assertion: The referral is recorded exactly once with the entered data, after re-authentication.

Evidence required: Token expiry time vs submit time; screen states through the flow; the preserved referral data; the server record of the referral.

Traceability: Risk register R-08 (session expiry discards unsaved clinical work without warning).

What makes it strong: the precondition AGES the session past the token lifetime instead of submitting straight away, the action submits AFTER expiry (the state the original never set up), and it asserts the work survives and the user is re-authenticated, ending on a server assertion of exactly one referral. The original could only ever pass.

🏗️ Exercise 3 of 3 — Design the State & Caching Test Cases

Design a state-and-caching test plan of 5 test cases for a fictional local-council rates portal PWA that has just shipped a fix. Each case needs at least: an ID, the stale/aged state it sets up, an acceptance criterion, and the evidence required. Cover expired-token mid-session, stale service worker after a deploy, corrupted localStorage, a stale cached balance after payment, and clean logout.

Show model answer

STC-01 | State set up: token expired mid-session before a key action | Acceptance criteria: the app preserves the user's unsaved work and re-authenticates cleanly; 0 blank pages or silent failures | Evidence required: token expiry vs action time; preserved data; screen states

STC-02 | State set up: app cached in a service worker on an OLD build, then a new build deployed | Acceptance criteria: the user on the old cached build receives the new build within a defined window (or on next visit); the old build is not served indefinitely | Evidence required: build version before/after; time-to-update; service worker update log

STC-03 | State set up: localStorage seeded with malformed / old-schema / partial data | Acceptance criteria: the app validates the stored data, recovers gracefully (ignores or resets it), and does not crash or show garbage | Evidence required: the seeded bad value; app behaviour on startup; recovery state

STC-04 | State set up: balance cached, then a payment made that changes it | Acceptance criteria: the cache is invalidated on the change so the user sees the new balance, not the old cached one | Evidence required: balance before/after payment; cache state; the value shown to the user

STC-05 | State set up: a logged-in session with stored token and cached personal data | Acceptance criteria: logging out clears the stored token and sensitive cached data; the next user of the device cannot resume the prior session | Evidence required: storage contents before/after logout; attempt to resume after logout

Strong plans: each case sets up a specific aged/stale state, has a measurable criterion, names concrete evidence, and together they cover expired-token (STC-01), stale service worker (STC-02), localStorage corruption (STC-03), stale cached balance (STC-04), and clean logout (STC-05). Weak plans say "test caching works" five times — that is the difference being marked.

Why teams fail here

Testing only on a freshly cleared browser: the one state where every caching and stale-session bug is invisible, so the whole class goes undetected until a real user hits it.
Treating CDN cache invalidation and service worker cache as the same problem: purging Cloudflare fixes edge-served assets but has no effect on a service worker already installed in the user's browser — those must be updated through the SW registration lifecycle.
Never testing the auth token expiry mid-task: because QA tests are fast, the 20-or-30-minute token never ages out during a test run, so the expired-token race condition ships untested every release.
Trusting localStorage as if the app wrote it: stored data persists across deploys and schema changes, so an app update can leave every returning user with a localStorage value the new code does not understand — and without a validation layer, the app crashes on first load for any user who hasn't cleared site data.

Key takeaway

A fresh browser is the perfect hiding place for every state and caching bug — if you only ever test clean, you are testing the one state your users left behind an hour ago.

11 Self-Check

Click each question to reveal the answer.

Q1: Why does a fresh-session test hide this whole class of bug?

Because state and caching defects live in aged, stale, in-between states — an expired token, an old cached build, leftover stored data — and a clean test environment always starts fresh, which is the one state where none of those exist. That is why the Tasman Ranges bug was closed as no-repro. To find these bugs you must deliberately age and corrupt the state, not refresh it away.

Q2: Name the four state and caching bug classes.

The expired-token race (a token expires mid-session and the app mishandles the failure), the stale service worker (an old cached build keeps running after a fix ships), localStorage corruption (the app trusts partial or old-schema stored data), and stale cache / failed invalidation (out-of-date cached data is shown as current).

Q3: What is the right first move when a reported bug will not reproduce?

Ask “what state was the user in that I am not?” — a long session, an expired token, an old cached build, leftover stored data — then recreate that state. The bug usually reproduces with it. A clean environment is exactly where these defects cannot live, so reproducing the user’s aged state is the key.

Q4: How do you test that a deployed fix actually reaches users?

With the “old build, new deploy” test: load the app as a user who already has the old version cached, ship a visibly different build, and confirm the new build arrives within a defined, acceptable window — or whether it ever does. It exercises asset versioning, the service worker update flow, and cache invalidation together, and catches users being stranded on a build you have already fixed.

Q5: Why must an app validate data it reads from localStorage?

Because stored data can be partially written, corrupted, or left over from an older version of the app, and an app that reads it blindly will crash, show garbage, or carry bad data forward. Seed malformed and old-schema stored data on purpose and confirm the app validates it and recovers gracefully rather than trusting it as always well-formed.

12 Interview Prep

Real questions asked in NZ QA interviews for web and PWA roles. Read the model answers, then practise your own version.

“A user reports a bug you cannot reproduce. How do you approach it?”

My first question is what state the user was in that I am not. A clean test environment starts fresh, and a whole class of bugs only lives in aged or stale state — an expired token, an old cached build, leftover localStorage, a stale cached value. So I try to recreate their state rather than my own: age the session past the token lifetime, load an old cached build, seed the stored data they would have had. Reproduce the state and the bug usually reproduces with it. Closing it as no-repro because it worked on a fresh attempt is exactly how the real bug survives — the fresh attempt is the one state it hides in.

“We shipped a fix but some users still see the old behaviour. What is your hypothesis?”

A stale service worker serving an old cached build. The fix is live, and it works for anyone who fetched the new version, but users who already had the PWA cached are still running yesterday’s code because nothing forced an update. I’d reproduce it by caching the old build, deploying the change, and timing whether and when the new build arrives. The fix is usually asset versioning so new files have new names the browser must fetch, plus a working service worker update flow that activates the new build within a sensible window. The test is the update path itself, not just whether the new code is correct.

“How would you test an app that keeps a user logged in across a long task?”

I’d focus on the moment the auth token expires mid-task, because that is where work gets silently lost. I deliberately age or clear the token before the key action — a submit, a save — rather than acting straight away while it is fresh. Then I assert the app does the right thing: it detects the expiry, preserves the user’s unsaved input, and re-authenticates cleanly with a silent refresh or a re-login prompt, never dumping them to a blank page. And I confirm on the server that the action is ultimately recorded exactly once. The fresh-and-fast path always passes; the aged path is the test that matters.

← Flaky API Resilience Back to Network & Resilience →

State & Caching Bugs

1 The Hook

2 The Rule

3 The Analogy

4 Where State and Caches Live

In-session state and auth tokens

localStorage and stored client data

The service worker and app-shell cache

HTTP and data caches

5 The Four Bug Classes

6 Cache Invalidation & Versioning

7 What to Test for State & Caching

8 Building State & Caching Test Cases

9 Common Mistakes

10 Now You Try

11 Self-Check

Related techniques

12 Interview Prep