Network & Resilience · Lesson 1

Throttling & Offline Testing

Your app works on the office network because the office network always works. Real users are on 3G in a paddock, in a tunnel, in a basement — and the app has to survive all of it. This lesson teaches you to test the network, not just the feature.

Network & Resilience Network & Resilience — Lesson 1 of 3 ~30 min read · ~70 min with exercises

1 The Hook

A fictional NZ agritech firm, Paddock Ledger, built a tablet app for farm staff to record stock movements and effluent readings out in the field. The app looked excellent in the demo at head office: tap a paddock, enter a reading, hit save, green tick, done. The product owner signed it off and a few hundred farms rolled it out.

Within a week the support line was running hot. Farm staff were entering a reading three valleys out from the nearest tower, getting the green tick, and driving on — only for the reading to never appear in the system. Some staff re-entered the same reading four or five times across a morning, creating duplicate records once they finally hit coverage again. Others lost a full day’s entries when the tablet was closed before the data ever left it.

Nothing was wrong with the save logic. The problem was that the app had been built and tested entirely on the head-office Wi-Fi, where every save reached the server in a few milliseconds. The green tick was shown the instant the user tapped save — before the data had actually been accepted by the server. On a fast network the gap between “tapped” and “stored on the server” is invisible. On a paddock at the edge of coverage, that gap is where the data lived and died.

Here is the lesson hidden in that story. The team tested that a reading could be saved. They never tested what happens when the network is slow, when it drops mid-save, or when it is simply not there. The green tick was a lie told by a fast connection. Throttling and offline testing is the practice of telling the truth about what your app does on the network your users actually have.

2 The Rule

The network is part of the system under test. A feature that only works on a fast, stable connection is untested, because most of NZ is not on one. Test every important flow under throttling, packet loss, and full offline — and never let the UI claim success until the data has actually reached and been accepted by the server.

3 The Analogy

Analogy

Posting a letter versus getting a signature on delivery.

When you drop a letter in a NZ Post box, it leaves your hand and you feel like the job is done. But you have no proof it arrived. If the address was wrong or the van broke down, the letter is simply gone and you never find out. That is the optimistic green tick — the app marks the job done the moment the user lets go, with no confirmation it ever landed.

Now picture a courier parcel that needs a signature on delivery. It is not done when it leaves you; it is done when someone at the other end signs for it, and until then it is tracked and can be re-attempted. That is what a resilient app does on a bad network: it holds the data, keeps trying, and only marks the job done once the server has signed for it. Throttling and offline testing is checking that your app behaves like the tracked courier parcel, not the letter you posted and hoped for.

4 The Network Conditions to Test

“Bad network” is not one condition. It is several, and they break software in different ways. A serious test plan names which one it is testing.

Slow but stable (throttled)

The connection works, but it is slow — high latency, low bandwidth, the classic 3G or congested rural cell. Requests still complete, just much later than on fibre. This is where timeouts that are too tight fire on a perfectly good request, where spinners run forever, and where a user double-taps because nothing seems to be happening. The fix is not faster code; it is a UI honest about waiting and timeouts tuned for the slow case.

Lossy and flickering (dropped packets)

The connection is there, then gone, then back — the State Highway with patchy towers, the lift between floors, the shed with a thick steel roof. Requests start and never finish, or finish on the second attempt. This is the hardest condition because the app cannot tell a slow response from a lost one. It is where retries and partial sends do their damage.

Fully offline

No connection at all, knowingly — aeroplane mode, the basement loading dock, the back-country hut. A good app detects this and behaves deliberately: queue the work, tell the user it is saved locally and will sync, and never pretend it reached the server. A bad app shows a generic error, or worse, a false success.

Transition (the reconnect)

The most defect-rich moment of all: the network coming back. Queued work has to flush, in the right order, exactly once. This is where duplicates are created, where stale data overwrites fresh data, and where the Paddock Ledger duplicates were born. Always test the transition itself, not just the offline and online states either side of it.

Pro tip: Browser dev tools let you throttle to preset profiles and toggle offline, which is the fastest way to start. But the cruel condition — the flickering connection — is the one you must engineer deliberately: drop the connection mid-request, then restore it, and watch what the app does with the half-finished work. That transition is where the real bugs live.

5 Offline & Optimistic UI Patterns

Apps built for bad networks use a small set of patterns. You cannot test them well if you cannot name them.

Optimistic UI updates the screen immediately, assuming the action will succeed, then quietly confirms or rolls back once the server responds. Done right it feels fast and is honest — it shows a “pending” state and reverts visibly if the save fails. Done wrong it is the Paddock Ledger green tick: it shows success and never reconciles, so a failed save looks identical to a successful one.

An offline queue (outbox) stores actions the user takes while disconnected — or while a send is in flight — and replays them when the connection returns. The user keeps working; the data waits safely. The tester’s job is to prove the queue survives the app being closed, flushes in order on reconnect, and never sends the same item twice.

Local-first storage writes data to the device first and treats the server as something to sync to, not the only place data lives. This is what lets a stock count entered in a dead zone survive until coverage returns. The risk it introduces is conflict: two devices, or one device and the server, both changing the same record. Conflict handling is a thing to test, not assume.

Graceful degradation means the app does less, clearly, when the network is poor — deferring an image upload, disabling a feature that genuinely needs the server, showing cached data with a “last updated” time. The opposite is an app that simply freezes or throws a blank error the moment the connection dips.

6 Retry, Backoff & Sync on Reconnect

When a request fails on a bad network, the app should usually try again — but how it retries is a whole field of defects.

Naive retry hammers the server immediately and repeatedly. On a flickering connection this floods the link the moment it recovers, and if the original request actually did land, the retries create duplicates. A reading saved once becomes saved five times — exactly the Paddock Ledger duplicate.

Exponential backoff spaces retries out: wait 1 second, then 2, then 4, then 8, with a cap. It gives a struggling network room to breathe instead of drowning it. Adding a small random jitter stops every device on a farm retrying in lockstep the instant a tower comes back and knocking it over again. A tester checks that retries back off rather than spin, and that they eventually give up and surface a clear error rather than retrying forever.

Idempotency is what makes retries safe. If each save carries a unique key the server recognises, a retry of a request that already succeeded is recognised and ignored rather than processed again. Without it, retry and duplicate are the same thing. This is the single most important control to test on any flow that retries — send the same keyed request twice and prove the server stores it once.

Sync on reconnect is the queued-work flush described above, and it deserves its own tests: items sync in the order they were created, each exactly once, the queue survives an app restart mid-sync, and a sync interrupted halfway (the network drops again during the flush) resumes cleanly without losing or duplicating items.

Pro tip: The killer test is “save while offline, close the app, reopen on a flickering connection.” It exercises local storage, the queue surviving a restart, retry under loss, idempotency, and ordered sync all at once. If a field app passes that single scenario, it has cleared most of what hurts real NZ users.

7 What to Test on a Throttled Network

The practical checklist for any flow that talks to a server:

Honest success state: the UI confirms success only after the server has accepted the data, not the moment the user taps — the core Paddock Ledger fix.
Slow-network behaviour: on a throttled connection the app shows a clear waiting state, does not let the user double-submit, and has timeouts loose enough not to fail a slow-but-good request.
Offline detection and queueing: the app notices it is offline, tells the user, lets them keep working, and queues their actions safely on the device.
Queue durability: queued work survives the app being closed and the device being restarted — it is not held only in memory.
Reconnect sync: on reconnect the queue flushes in order, each item exactly once, with no duplicates and no items silently dropped.
Retry discipline: failed requests retry with backoff and jitter, give up with a clear error eventually, and never duplicate a request that already succeeded (idempotency).
Mid-flight drop: a request interrupted halfway resolves to a known state — never a phantom success and never silent data loss.
Conflict handling: where local-first storage allows it, a record changed in two places resolves by a defined rule, not last-write-wins by accident.

8 Building Throttled-Network Test Cases

A strong network test case names the network condition, drives the flow under that exact condition, and asserts on what the server actually stored — not just on what the screen showed. Here is a worked case written to catch the Paddock Ledger bug:

Test ID:            NET-OFF-007

Network condition:  Offline at save, then reconnect on a flickering 3G link

Risk category:      False success / lost field data on a bad network

Pre-conditions:     App logged in; device set to offline; one stock reading ready to enter.

Action:             1) Offline: enter the reading and tap save.

                  2) Force-close the app.

                  3) Reopen on a connection that drops once during sync, then restore it.

Expected result:    1) Offline save shows a “saved on device, will sync” state — NOT a server-confirmed tick.

                  2) The queued reading survives the force-close.

                  3) On reconnect the reading syncs and the state becomes server-confirmed.

                  4) The drop during sync does not create a second copy.

Server assertion:   Exactly ONE reading record exists on the server for this entry.

Evidence required:  Screen states at each step; device queue contents after force-close;

                  server query showing a single record; sync log.

Traceability:       Risk R-01 (field data lost or duplicated on poor connectivity).

Result:             [Pass / Fail]

Notice what makes this catch the Hook bug: the offline save is asserted to show a local “will sync” state and explicitly not a server-confirmed tick; the queue is checked after a force-close, not just in memory; the sync is run across a deliberate drop; and the final assertion is on the server — exactly one record — not on what the screen said. The condition is named at the top so a reviewer knows precisely what was simulated.

9 Common Mistakes

🚫 Showing success the moment the user taps, before the server confirms

Why it happens: On the dev network the server responds instantly, so the gap is invisible and an optimistic tick feels safe.
The fix: That is the Paddock Ledger trap. Show a pending or “saved on device” state and only confirm true success once the server has accepted the data. Test the success state on a slow and offline connection, not just a fast one.

🚫 Only testing “online” and “offline”, never the transition

Why it happens: The two steady states are easy to set up; the moment between them is fiddly.
The fix: Most network bugs live in the reconnect — the queue flush, the retries, the duplicates. Deliberately drop the connection mid-request and restore it, and test sync interrupted halfway. The transition is the test, not the bookends.

🚫 Retrying without idempotency

Why it happens: Adding a retry is one line of code and obviously a good idea.
The fix: A retry of a request that already succeeded creates a duplicate — the duplicate readings on a farm. Each request needs a unique key the server recognises so a re-send is ignored, not re-processed. Test by sending the same keyed request twice and proving one record results.

🚫 Timeouts tuned for fibre, firing on a good 3G request

Why it happens: The timeout was set against dev-network response times and never revisited.
The fix: A 2-second timeout that is generous on fibre kills a perfectly valid request on rural 3G, turning a slow success into a false failure and a needless retry. Test timeouts under throttling and set them for the slow case, with a clear waiting state while the user waits.

10 Now You Try

Three graded exercises across throttling, offline, and reconnect. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Test This Under 3G and Offline

Read the description of a fictional rural courier delivery app below. Identify 3 network risks that could lose data, create duplicates, or mislead the user, and name the condition each shows up under (slow/throttled, dropped/flickering, fully offline, or reconnect transition).

App: rural delivery scanner
Drivers scan a parcel as “delivered” at each stop on a rural delivery run. On scan, the app immediately shows a green “Delivered” tick and sends the update to the server. If the send fails, the app retries the request every second until it succeeds. There is no on-device queue — if the send has not gone through when the driver closes the app at the end of the run, the pending update is gone. When coverage returns after a long dead zone, the app fires all its pending and retrying sends at once.

List 3 network risks and the condition each shows up under:

Show model answer

There are at least four real risks here; any three well-explained earns full marks.

1. False success on scan — The green "Delivered" tick shows the instant the driver scans, before the server confirms. On a flickering or offline link the update may never land, but the driver believes the stop is done. Condition: fully offline / dropped. Impact: a parcel marked delivered that the system never recorded. Fix: show a "pending sync" state and confirm only after the server accepts it.

2. No on-device queue — Pending sends live only in memory, so closing the app at the end of a run loses any update that has not gone through. Condition: fully offline (the whole dead-zone run). Impact: a full run of deliveries lost. Fix: a durable on-device queue that survives the app closing.

3. Naive 1-second retry with no backoff or idempotency — Retrying every second hammers the link, and a retry of a send that actually landed creates a duplicate "delivered" record. Condition: dropped/flickering and reconnect. Impact: server overload plus duplicate delivery records. Fix: exponential backoff with jitter, and an idempotency key so a retry is ignored if the original succeeded.

Bonus — thundering herd on reconnect: firing all pending sends at once when coverage returns can knock the link over again. Condition: reconnect transition. Fix: stagger the flush with jitter.

The trap: every one of these passes a test run on the office Wi-Fi, where sends never fail.

🔧 Exercise 2 of 3 — Fix the Test Case

The test case below only proves the happy path on a good connection. Rewrite it to test a bad network end to end, with these fields: Test ID, Network condition, Risk category, Pre-conditions, Action, Expected result, Server assertion, Evidence required, Traceability. Use a fictional Waka Kotahi roadside inspection app on a State Highway with patchy coverage as the context.

Original (too shallow):
“Fill in the inspection form and tap submit. Check it shows submitted. Pass if it says submitted.”

Rewrite as a bad-network test case:

Show model answer

Test ID: NET-OFF-012

Network condition: Connection drops during submit on a flickering 3G link, then restores

Risk category: False success / lost or duplicated inspection on poor coverage

Pre-conditions: App logged in; one completed roadside inspection ready to submit; network set to drop the connection mid-submit then restore it.

Action: 1) Tap submit while the connection is up. 2) Force the connection to drop before the server responds. 3) Restore the connection and let the app resolve the in-flight submit.

Expected result: 1) During the drop the app shows a "submitting / will retry" state, not a confirmed "submitted" tick. 2) The inspection is held on the device, not lost. 3) On restore the submit completes and the state becomes server-confirmed. 4) The interrupted-then-retried submit does not create a second inspection.

Server assertion: Exactly ONE inspection record exists on the server for this submission.

Evidence required: Screen state during the drop and after restore; device queue/holding state during the drop; server query showing a single inspection; retry/sync log.

Traceability: Risk register R-02 (inspection lost or duplicated on patchy State Highway coverage).

What makes it strong: it names the exact network condition, drives the submit across a deliberate drop, asserts a pending (not confirmed) state during the outage, and ends on a SERVER assertion of exactly one record — not on what the screen said. The original tested none of this.

🏗️ Exercise 3 of 3 — Build a Reconnect-Sync Test Plan

Design a reconnect-and-sync test plan of 5 test cases for a fictional field-services app used by Te Whatu Ora district nurses who record visits in homes with no reliable signal. Each case needs at least: an ID, what it verifies, an acceptance criterion, and the evidence required. Cover offline capture, queue durability across a restart, ordered sync, idempotent retry, and sync interrupted mid-flush.

Show model answer

SYN-01 | Verifies: a visit captured fully offline is stored on the device | Acceptance criteria: with the device offline, a saved visit shows a "saved on device, will sync" state and is readable back after navigating away; 0 reliance on the server | Evidence required: offline screen state; on-device record after save

SYN-02 | Verifies: the queue survives the app being force-closed and the device restarted | Acceptance criteria: a queued visit is still present and unsent after a force-close and device reboot, before any reconnect | Evidence required: queue contents before and after reboot; no server record yet

SYN-03 | Verifies: queued visits sync in the order they were created | Acceptance criteria: on reconnect, visits arrive on the server in creation order; 0 out-of-order or missing items | Evidence required: device creation timestamps; server receipt order; reconciliation of counts

SYN-04 | Verifies: a retried send does not duplicate a visit | Acceptance criteria: a visit whose first send actually landed, then is retried, results in exactly one server record (idempotency key honoured) | Evidence required: idempotency key; two send attempts in the log; single server record

SYN-05 | Verifies: sync interrupted mid-flush resumes cleanly | Acceptance criteria: dropping the connection partway through flushing a multi-item queue loses no item and duplicates no item once it resumes; the server total matches the device total | Evidence required: queue state at interruption; server vs device counts after resume; sync log

Strong plans: each case is specific, has a measurable criterion, names concrete evidence, and together they cover offline capture (SYN-01), queue durability across restart (SYN-02), ordered sync (SYN-03), idempotent retry (SYN-04), and interrupted flush (SYN-05). Weak plans say "check sync works" five times — that is the difference being marked.

11 Self-Check

Click each question to reveal the answer.

Q1: Why is a green “saved” tick on a fast network a problem on a slow one?

Because the tick is often shown the instant the user taps, before the server has actually accepted the data. On fibre the gap between tap and stored is invisible, so the optimistic tick looks honest. On a slow or offline connection the data may never reach the server, yet the user is told it succeeded — the Paddock Ledger trap. Confirm true success only after the server accepts the data.

Q2: Name the four network conditions to test and why the reconnect is the worst.

Slow but stable (throttled), lossy and flickering (dropped packets), fully offline, and the reconnect transition. The reconnect is the worst because that is when queued work flushes and retries fire — the moment duplicates are created, stale data overwrites fresh data, and a half-finished sync goes wrong. Test the transition itself, not just the steady states either side.

Q3: What makes a retry safe, and how do you test it?

Idempotency. Each request carries a unique key the server recognises, so a retry of a request that already succeeded is ignored rather than processed again. Without it, retry and duplicate are the same thing. Test it by sending the same keyed request twice and proving the server stores exactly one record.

Q4: What is exponential backoff with jitter, and what problem does jitter solve?

Backoff spaces retries out — 1s, 2s, 4s, 8s, capped — so a struggling network gets room instead of being hammered. Jitter adds a small random delay so that many devices do not all retry in lockstep the instant a tower returns and immediately overwhelm it again (the thundering herd). Test that retries back off rather than spin, and eventually give up with a clear error.

Q5: What single scenario exercises most of what hurts real NZ field users at once?

“Save while offline, close the app, reopen on a flickering connection.” It tests local storage, the queue surviving a restart, retry under packet loss, idempotency, and ordered sync in one go — and it ends on a server assertion of exactly one record, not on what the screen showed.

12 Interview Prep

Real questions asked in NZ QA interviews for mobile and field-app roles. Read the model answers, then practise your own version.

“How would you test a field app that has to work where there is no signal?”

I’d treat the network as part of the system under test, not a given. I’d test four conditions separately: throttled (slow 3G), flickering (dropped packets), fully offline, and the reconnect transition. For each I check the app shows an honest state — a “saved on device, will sync” message offline rather than a false success. Then I focus on the reconnect, because that is where the bugs are: I save offline, force-close the app, reopen on a connection that drops mid-sync, and assert on the server that exactly one record exists. The whole point is following the data to the server, not stopping at the green tick.

“A user on rural broadband says their entries sometimes appear twice. What is your first hypothesis?”

My first hypothesis is retries without idempotency. The connection probably flickered, the original save actually landed, but the app could not tell, so it retried and created a second record. I’d reproduce it by dropping the connection just after a save reaches the server and watching the retry fire, then confirm two records result. The fix is an idempotency key so the server recognises the retry and stores one record, plus exponential backoff with jitter so the retries do not flood the link when coverage returns.

“Why is testing the reconnect more important than testing offline mode?”

Offline mode is a steady state — the app either queues work or it does not, and that is fairly easy to verify. The reconnect is a transition, and transitions are where state goes wrong. On reconnect the queue flushes, retries fire, and the app reconciles local data with the server all at once — that is where duplicates appear, where stale data overwrites fresh, and where a sync interrupted halfway loses items. So I deliberately engineer the reconnect: drop the connection mid-flush, restore it, and check the device and server totals match exactly. The bookends are easy; the join between them is the test.

← Network & Resilience Overview Next: Flaky API Resilience →