Blockchain Testing
Verifying systems where the data store is a shared, append-only ledger and the business logic runs as smart contracts. The hard part for a tester is that you cannot edit a record, you cannot always trust a "confirmed" result, and the same transaction can behave differently depending on where it lands in a block.
1 The Hook
A New Zealand exporter builds a provenance ledger so an overseas buyer can confirm that a shipment of mānuka honey really did come from the listed apiary. Each step — harvest, lab test, pack, export — writes a record to the chain. A smart contract releases payment to the producer once the "exported" record is written. The team tests it on a clean run: harvest, test, pack, export, payment fires. Looks great. Ship it.
Two weeks later a producer is paid twice for one shipment. What happened? A network hiccup made the buyer's app resubmit the "exported" transaction. On a normal database a duplicate insert would have hit a unique constraint and bounced. On the ledger, both transactions were valid signatures, both landed in different blocks, and the contract had no guard against being called twice for the same shipment. The money moved — and because the ledger is immutable, there was no "undo".
This is the shift a tester has to make. The defect was not a wrong button or a bad screen. It was a missing assumption: that "confirmed once" means "confirmed forever, exactly once". On a blockchain, neither half of that is automatic. You have to test for replay, for reordering, and for the fact that nothing can be deleted after the fact.
2 The Rule
On a blockchain you cannot undo a mistake and you cannot assume a transaction runs once, in the order you sent it, on the chain you think you are on — so test the smart-contract logic for replay, reordering, reentrancy and edge values, and verify that on-chain truth matches what your off-chain app shows the user.
3 The Analogy
A pen-and-ink ledger chained to a desk, copied by a room full of clerks.
Imagine an old shipping office where every clerk keeps their own copy of the same ledger in permanent ink. There is no eraser. If you write a wrong figure, you cannot scrub it out — the most you can do is write a correcting line below it, and everyone can still see the original mistake. Before a line is treated as "official", a majority of the clerks have to agree their copies match. Until they do, your figure might still be dropped if a different version of the page wins.
Testing a blockchain is testing that office. You check what happens when the same instruction is shouted twice, when two clerks briefly disagree about the latest page, when a figure is written that the ink can never take back, and when an outside informant (an oracle) feeds the clerks a price that turns out to be wrong. The permanence is the whole point — and the whole risk.
What it is
Blockchain testing is verifying a system whose data lives on a distributed ledger — a record store that is shared across many nodes, append-only, and changed only through agreed transactions. Most of the business logic runs as smart contracts: small programs that live on the chain and execute deterministically when called.
As a tester you are usually not writing the contract. Your job is to verify it: does it do the right thing for the right inputs, does it refuse the wrong ones, and does the wider app tell the user the truth about what is actually on the chain? The unusual properties — immutability, finality, consensus, public ordering — create failure modes that ordinary application testing never has to think about.
Four layers to keep separate in your head:
- Smart contract — the on-chain logic. Deterministic, immutable once deployed, costs "gas" to run.
- Chain / consensus — how nodes agree on the next block. Affects ordering, finality and forks.
- Off-chain app — the website, wallet integration and backend that read from and write to the chain.
- Oracles — bridges that feed real-world data (prices, weather, a lab result) onto the chain.
Smart-contract testing
A smart contract is a deterministic program, so a lot of standard technique applies: equivalence partitioning on inputs, boundary value analysis on amounts and limits, decision tables on the branches. But three failure modes are specific to contracts and worth knowing by name.
- Reentrancy — a contract calls out to another address before it finishes updating its own balances. The called code can call back in and drain funds while the first call still thinks the balance is unchanged. Test it by simulating a malicious recipient that re-enters the function.
- Gas and limits — every operation costs gas, and there is a ceiling per transaction. A loop over an unbounded list (for example, "pay out every holder") can pass with 10 holders and fail at 10,000 because it runs out of gas. Test with large data, not just a handful of records.
- Edge values and arithmetic — integer overflow/underflow, zero amounts, rounding on token decimals, and the value at exactly a threshold (a withdrawal cap, a minimum stake). Boundary value analysis is your friend here, and the cost of a wrong comparison is real money.
Immutability and finality
Two related properties trip up testers who come from a database background.
Immutability means a confirmed transaction cannot be edited or deleted. There is no UPDATE and no rollback. If a contract writes a wrong value, the fix is a new compensating transaction, and the wrong value stays visible in history forever. So your test charter is not just "does the happy path write the right record" — it is "what is the blast radius if a bad record gets written, and can the system correct it without deleting anything?"
Finality is the question of when a transaction is safe to rely on. A transaction can be "included" in a block and then dropped if that block loses out to a competing one (a reorg). On many chains you wait for several confirmations before treating a result as final. A classic bug: the off-chain app shows "Payment complete" the moment the transaction is included, then the block is reorged out, and the payment silently never happened. Test the gap between "included" and "final".
Consensus, forks and ordering
Nodes have to agree on the order of transactions. While they are agreeing, two valid versions of the chain can briefly co-exist — a fork. Most forks resolve in seconds, but during that window a transaction's fate is uncertain. You test fork handling by asking: if the chain reorgs, does the app re-check the on-chain state, or does it keep showing a stale "success"?
Transaction ordering is the other surprise. Within a block, the order of transactions is not guaranteed to match the order you submitted them. Two users (or one user, twice) can have their transactions reordered, and a contract that assumes "first come, first served" can behave wrongly. The whole class of "someone sees my pending transaction and jumps ahead of it" lives here. As a tester you do not need to attack the chain — you need to verify the contract does not depend on an ordering it cannot guarantee.
On-chain vs off-chain state, and wallets
Most real products keep some data on-chain (the parts that need to be trustless and permanent) and some off-chain (everything cheap, private, or large). The single most common source of user-facing bugs is the two drifting apart: the off-chain database says one thing, the chain says another. Your verification rule of thumb — the chain is the source of truth; the off-chain copy is a cache, and caches go stale. Always confirm the displayed state against the actual on-chain state, not against the app's own database.
Wallets and keys are the user's identity. There is no "forgot password". If a key is lost, the assets are gone; if a key is exposed, anyone can sign as that user. Test the integration around signing: that the app asks for a signature for exactly the action shown, that a rejected signature leaves no half-finished state, and that the app never logs or transmits a private key. Treat key handling as security-sensitive every time.
Oracles and testnets
A contract that needs real-world data — an exchange rate, a shipping status, a lab certificate — gets it from an oracle. The oracle is a trust boundary and a single point of failure: if it reports a wrong or stale value, the contract acts on garbage, permanently. Test the oracle path with bad data: a stale timestamp, an out-of-range value, a missing update. The contract should refuse to act on data it cannot trust.
Testnets are the safe place to do all of this. A testnet is a parallel chain that behaves like the real one but uses valueless tokens, so you can deploy, break and redeploy contracts without spending real money. Run your destructive and edge-case testing on a testnet or a local fork; never make a "let's see what happens" transaction on the live chain, because there is no undo.
Real-world NZ example — tokenised carbon-credit registry
Picture an NZ Emissions Trading Scheme registry that issues each carbon credit as a token on a ledger. A contract lets a holder retire a credit (use it against emissions, removing it from circulation) and lets others transfer credits. Test charter highlights:
- Double-spend / double-retire: can the same credit be retired twice, or transferred and then retired by the old owner? Replay and reorder the transactions.
- Finality before display: does the registry show "credit retired" only after enough confirmations, or the instant the transaction is included?
- Off-chain ledger drift: does the registry's reporting database match the on-chain token supply exactly? A mismatch means the public total is wrong.
- Oracle for the emissions figure: if the audited emissions value feeding the contract is stale or out of range, does the contract refuse it?
- Immutability blast radius: if a credit is wrongly issued, what is the corrective transaction, and does the history still show the error (it must)?
Common mistakes
⚠ Testing only the happy path on a clean chain
A single successful run proves almost nothing. The defects live in replay, reorder, reentrancy and out-of-gas. Design transactions that fire twice, land out of order, and run at scale.
⚠ Trusting "confirmed" as final
Inclusion in a block is not finality. Test the window where a transaction can still be reorged out, and check the app does not show success too early.
⚠ Believing the off-chain database over the chain
The chain is the source of truth; the app's database is a cache. Always verify displayed state against on-chain state, or you will sign off on a screen that lies.
⚠ Assuming transaction order matches submission order
Ordering inside a block is not guaranteed. A contract that depends on "first one in wins" needs explicit testing for reordered and competing transactions.
⚠ Doing destructive testing on the live chain
There is no undo and gas costs real money. Run edge-case and break-it testing on a testnet or local fork, and keep keys out of logs and test reports.
4 Now You Try
Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.
A smart contract on the mānuka-honey provenance ledger releases payment to a producer the first time an "exported" record is written for a shipment. List the blockchain-specific failure modes a tester should probe (think replay, ordering, finality, immutability, off-chain drift) and, for each, the test you would run.
Show model answer
Blockchain-specific risks for "pay on first exported record": - Replay / double-spend — the same "exported" transaction is resubmitted (a network retry) and pays the producer twice. Test: fire the identical transaction twice and confirm the contract pays exactly once (it needs a per-shipment guard). - Transaction ordering — "exported" lands in a block before "packed" because order is not guaranteed. Test: submit the records out of order and confirm the contract refuses to pay without the required prior states. - Finality vs inclusion — payment is shown as complete the moment the transaction is included, then the block is reorged out and the payment never happened. Test: simulate a reorg and confirm the app only reports success after enough confirmations. - Immutability blast radius — a wrong payment cannot be deleted. Test: confirm the only correction is a new compensating transaction and that the erroneous record stays visible in history. - Off-chain drift — the producer dashboard reads from the app database, not the chain. Test: confirm displayed "paid" status matches the actual on-chain balance. The core insight: a normal duplicate-insert would bounce on a unique constraint; on a ledger both signatures are valid, so the guard has to be in the contract logic, and the tester has to attack it.
A tester wrote the charter below for a contract that lets a holder retire a tokenised carbon credit. It is weak: it only checks a happy path, trusts the app's screen, and never touches the ledger's special properties. Rewrite it into a stronger charter.
1. Log in and select a credit.
2. Click Retire.
3. Confirm the screen says "Retired".
4. Done — the credit is retired.
Rewrite as a stronger charter:
Show model answer
Stronger charter for retiring a carbon credit: 1. Retire the credit, then verify the on-chain token state directly (not the screen): the credit is marked retired and removed from circulating supply. 2. Try to retire the same credit a second time, and try to transfer it after retiring — both must be rejected. (Double-retire / spend-after-retire.) 3. Confirm the app shows "Retired" only after enough confirmations for finality; simulate a reorg and confirm it does not report success on a transaction that gets dropped. 4. Reconcile the off-chain reporting database against the on-chain supply — they must match exactly. 5. Confirm a wrongly retired credit cannot be deleted; the only fix is a documented compensating transaction, and the error stays in history. What was wrong with the original: - Happy path only: no replay, no double-retire, no out-of-order, no scale. - Trusted the screen: it confirmed the app's own message instead of the on-chain truth — the screen is a cache and can be stale or wrong after a reorg. - Ignored finality: "Retired" the instant the click returns is not the same as finalised on-chain. - Ignored immutability: never considered the blast radius of a wrong retirement, which cannot be undone.
A contract releases an export payment only when an oracle reports a lab pass and the shipment weight is within 1 kg to 500 kg (inclusive). Design test cases covering (a) the weight boundaries with 2-value BVA and (b) the oracle trust boundary — what bad oracle data must the contract refuse?
Show model answer
Weight boundary tests (range 1–500 kg inclusive, 2-value BVA): - 0 kg — Reject (just below lower boundary; also a zero/empty-shipment edge) - 1 kg — Accept (lower boundary, inclusive) - 500 kg — Accept (upper boundary, inclusive) - 501 kg — Reject (just above upper boundary) (Bonus: also probe a negative weight and an absurd value to confirm input validation.) Oracle trust-boundary tests — the contract must REFUSE to release payment on: - Stale data — the lab-pass timestamp is older than the allowed freshness window. Reject (do not act on stale truth). - Out-of-range / impossible value — the oracle reports a weight or result outside any sane bound, or a malformed value. Reject. - Missing / no update — the oracle has not reported for this shipment at all. The contract must wait, not assume a pass. Senior note: the oracle is a single point of trust. Even with perfect weight boundaries, a bad oracle feed makes the contract act on garbage permanently — so refusing untrusted data is as important as the BVA on the weight.
Self-Check
Click each question to reveal the answer.
Q1: Why is a single successful run a weak test for a smart contract?
Because the defects that matter on a ledger are not on the happy path. They appear when the same transaction runs twice (replay), when transactions land out of order, when a contract calls out and is re-entered, or when a loop runs out of gas at scale. A clean single run never exercises any of those, so it tells you almost nothing about the real risk.
Q2: What is the difference between a transaction being "included" and being "final", and why does it matter to a tester?
Included means it is in a block; final means that block can no longer be replaced. Between the two, a reorg can drop the transaction so the action never really happened. It matters because an app that shows "success" on inclusion can be showing a result that quietly disappears — so the tester must check the app waits for enough confirmations before reporting success.
Q3: A user dashboard says a payment was made, but the on-chain balance has not changed. Which one do you trust, and what does that tell you?
Trust the chain. The chain is the source of truth; the off-chain database is a cache that can drift. The mismatch is itself the bug — the app is showing state that is not real — and it means your verification must always read on-chain state, never just the app's own screen or database.
Q4: What is reentrancy, and how would you test for it without writing the contract yourself?
Reentrancy is when a contract calls out to another address before it finishes updating its own state, and the called code calls back in and acts on the stale state — classically to drain funds. As a tester you simulate a malicious recipient that re-enters the function during the call, and confirm the contract has updated its balances before making the external call (or otherwise guards against re-entry).
Q5: Why do destructive and edge-case tests belong on a testnet, and what does immutability mean for a wrong record?
Because the live chain has no undo and every transaction costs real money — a "let's see what happens" call could move real value permanently. A testnet behaves like the real chain with valueless tokens, so you can break and redeploy freely. Immutability means a wrong record cannot be edited or deleted; the only fix is a new compensating transaction, and the original error stays visible in history forever.
Interview Prep
"How would you approach testing a smart contract if you are not a Solidity developer?"
I treat the contract as a deterministic spec to verify, not code to write. I read what states and transitions it claims to support, then design transactions that probe the edges: boundary values on amounts and limits, equivalence partitions on inputs, and the blockchain-specific moves — calling the same function twice (replay), calling it out of order, calling it with zero or maximum values, and running it at scale to catch out-of-gas. I run all of it on a testnet, and I verify results against on-chain state, not the app's screen.
"A team says 'it is on the blockchain so it must be correct and final'. How do you respond?"
I would gently separate two things. The ledger guarantees that what is written cannot be tampered with — but it does not guarantee that what was written is correct, that it ran exactly once, or that it has reached finality yet. Immutability actually raises the stakes: a wrong record is permanent. So I would still test the contract logic, the replay and ordering behaviour, the confirmation/finality handling, and the oracle inputs — because "on the chain" only secures the data after the logic has already decided what to write.
"What is the biggest source of user-facing bugs in a blockchain product, in your experience?"
Drift between off-chain and on-chain state. The app keeps a fast local copy of what is on the chain so the UI feels responsive, and that copy goes stale — after a reorg, a missed event, or a slow sync. The user sees "paid" or "retired" when the chain says otherwise. My rule is that the chain is the source of truth and the local database is a cache, so I always reconcile the displayed state against the actual on-chain state.
Related techniques
Smart-contract input testing leans heavily on Boundary Value Analysis and Equivalence Partitioning for amounts and limits, and on Decision Table Testing for branching contract logic.
The states a credit or shipment moves through map cleanly onto State Transition Testing, and reorg/fork resilience is a natural fit for Chaos Engineering.
Key handling and reentrancy are squarely Security Testing concerns, and the oracle path is best probed with API Mocking & Stubbing to feed deliberately bad data.