20 min read · 9 self-checks · Updated June 2026

Domain · Distributed Ledger

Blockchain Testing

Verifying systems where the data store is a shared, append-only ledger and the business logic runs as smart contracts. The hard part for a tester is that you cannot edit a record, you cannot always trust a "confirmed" result, and the same transaction can behave differently depending on where it lands in a block.

Senior Specialised domain

Senior engineer insight

The single mental shift that changed how I test blockchain systems: stop thinking of the contract as the system and start thinking of the event pipeline as the system. The contract is the easy part — it is deterministic, auditable, and well-understood by the team. The thing that burns you in production is the background indexer that listens for on-chain events and writes them into your relational database. That service silently drops events under load, skips records after a reorg, and nobody ever load-tests it. I have seen contracts tested to exhaustion while the event listener quietly misreported state to every user for months.

The most common mistake: treating a passed security audit as a substitute for functional QA. Auditors check for reentrancy and overflow — they do not test your business logic, your oracle trust boundary, or whether your UI waits for enough confirmations before showing "complete". Teams discover this gap on go-live, on an immutable ledger, with real money.

From the field

A NZ fintech team piloting tokenised trade-finance documents on Ethereum hired me to review their test coverage two weeks before go-live. The contract had passed a Big 4 security audit and the team was confident. What they had not tested was the off-chain reconciliation: their backend maintained a Postgres cache of every token's ownership so the dashboard could load fast without hitting the chain on every page view. Under a realistic transaction volume on the Sepolia testnet, we discovered the cache sync was falling roughly 90 seconds behind — and during a simulated network blip that caused two blocks to be reorged, the cache never recovered. The dashboard showed "owned" for tokens that had already transferred. The contract was flawless. The database was lying.

The lesson that generalises: in any hybrid blockchain system, the off-chain database is a cache, and caches go stale under pressure and after reorgs. Reconciliation between displayed state and on-chain truth is not a nice-to-have — it is a first-class test requirement, and it must be run under load, not just on a quiet testnet.

1 The Hook

A New Zealand exporter builds a provenance ledger so an overseas buyer can confirm that a shipment of mānuka honey really did come from the listed apiary. Each step — harvest, lab test, pack, export — writes a record to the chain. A smart contract releases payment to the producer once the "exported" record is written. The team tests it on a clean run: harvest, test, pack, export, payment fires. Looks great. Ship it.

Two weeks later a producer is paid twice for one shipment. What happened? A network hiccup made the buyer's app resubmit the "exported" transaction. On a normal database a duplicate insert would have hit a unique constraint and bounced. On the ledger, both transactions were valid signatures, both landed in different blocks, and the contract had no guard against being called twice for the same shipment. The money moved — and because the ledger is immutable, there was no "undo".

This is the shift a tester has to make. The defect was not a wrong button or a bad screen. It was a missing assumption: that "confirmed once" means "confirmed forever, exactly once". On a blockchain, neither half of that is automatic. You have to test for replay, for reordering, and for the fact that nothing can be deleted after the fact.

💬

Senior Engineer Insight

Every team I've seen prepare for a blockchain release obsesses over the smart contract. The contract gets audited, fuzz-tested, reviewed line by line. Then it ships, and six months later the production incident is not in the contract at all — it is in the event listener. That background service watches for on-chain events and writes them into your relational database. It falls behind under load, it silently skips events after a reorg, and nobody ever load-tested it. On one NZ financial-services project we found the indexer had been dropping roughly one event in every eight hundred under realistic transaction volume. The contract was perfect. The database was quietly lying to every user. Test your off-chain integration at least as hard as you test the contract itself.

2 The Rule

On a blockchain you cannot undo a mistake and you cannot assume a transaction runs once, in the order you sent it, on the chain you think you are on — so test the smart-contract logic for replay, reordering, reentrancy and edge values, and verify that on-chain truth matches what your off-chain app shows the user.

3 The Analogy

Analogy

A pen-and-ink ledger chained to a desk, copied by a room full of clerks.

Imagine an old shipping office where every clerk keeps their own copy of the same ledger in permanent ink. There is no eraser. If you write a wrong figure, you cannot scrub it out — the most you can do is write a correcting line below it, and everyone can still see the original mistake. Before a line is treated as "official", a majority of the clerks have to agree their copies match. Until they do, your figure might still be dropped if a different version of the page wins.

Testing a blockchain is testing that office. You check what happens when the same instruction is shouted twice, when two clerks briefly disagree about the latest page, when a figure is written that the ink can never take back, and when an outside informant (an oracle) feeds the clerks a price that turns out to be wrong. The permanence is the whole point — and the whole risk.

What it is

Blockchain testing is verifying a system whose data lives on a distributed ledger — a record store that is shared across many nodes, append-only, and changed only through agreed transactions. Most of the business logic runs as smart contracts: small programs that live on the chain and execute deterministically when called.

As a tester you are usually not writing the contract. Your job is to verify it: does it do the right thing for the right inputs, does it refuse the wrong ones, and does the wider app tell the user the truth about what is actually on the chain? The unusual properties — immutability, finality, consensus, public ordering — create failure modes that ordinary application testing never has to think about.

Four layers to keep separate in your head:

Smart contract — the on-chain logic. Deterministic, immutable once deployed, costs "gas" to run.
Chain / consensus — how nodes agree on the next block. Affects ordering, finality and forks.
Off-chain app — the website, wallet integration and backend that read from and write to the chain.
Oracles — bridges that feed real-world data (prices, weather, a lab result) onto the chain.

Smart-contract testing

A smart contract is a deterministic program, so a lot of standard technique applies: equivalence partitioning on inputs, boundary value analysis on amounts and limits, decision tables on the branches. But three failure modes are specific to contracts and worth knowing by name.

Reentrancy — a contract calls out to another address before it finishes updating its own balances. The called code can call back in and drain funds while the first call still thinks the balance is unchanged. Test it by simulating a malicious recipient that re-enters the function.
Gas and limits — every operation costs gas, and there is a ceiling per transaction. A loop over an unbounded list (for example, "pay out every holder") can pass with 10 holders and fail at 10,000 because it runs out of gas. Test with large data, not just a handful of records.
Edge values and arithmetic — integer overflow/underflow, zero amounts, rounding on token decimals, and the value at exactly a threshold (a withdrawal cap, a minimum stake). Boundary value analysis is your friend here, and the cost of a wrong comparison is real money.

Tester focus: you do not need to write Solidity to test a contract. You need to read the spec, enumerate the states and transitions it claims to support, and then design transactions that probe the edges — especially calling the same function twice, calling it out of the expected order, and calling it with a zero or maximum value.

Immutability and finality

Two related properties trip up testers who come from a database background.

Immutability means a confirmed transaction cannot be edited or deleted. There is no UPDATE and no rollback. If a contract writes a wrong value, the fix is a new compensating transaction, and the wrong value stays visible in history forever. So your test charter is not just "does the happy path write the right record" — it is "what is the blast radius if a bad record gets written, and can the system correct it without deleting anything?"

Finality is the question of when a transaction is safe to rely on. A transaction can be "included" in a block and then dropped if that block loses out to a competing one (a reorg). On many chains you wait for several confirmations before treating a result as final. A classic bug: the off-chain app shows "Payment complete" the moment the transaction is included, then the block is reorged out, and the payment silently never happened. Test the gap between "included" and "final".

Consensus, forks and ordering

Nodes have to agree on the order of transactions. While they are agreeing, two valid versions of the chain can briefly co-exist — a fork. Most forks resolve in seconds, but during that window a transaction's fate is uncertain. You test fork handling by asking: if the chain reorgs, does the app re-check the on-chain state, or does it keep showing a stale "success"?

Transaction ordering is the other surprise. Within a block, the order of transactions is not guaranteed to match the order you submitted them. Two users (or one user, twice) can have their transactions reordered, and a contract that assumes "first come, first served" can behave wrongly. The whole class of "someone sees my pending transaction and jumps ahead of it" lives here. As a tester you do not need to attack the chain — you need to verify the contract does not depend on an ordering it cannot guarantee.

On-chain vs off-chain state, and wallets

Most real products keep some data on-chain (the parts that need to be trustless and permanent) and some off-chain (everything cheap, private, or large). The single most common source of user-facing bugs is the two drifting apart: the off-chain database says one thing, the chain says another. Your verification rule of thumb — the chain is the source of truth; the off-chain copy is a cache, and caches go stale. Always confirm the displayed state against the actual on-chain state, not against the app's own database.

Wallets and keys are the user's identity. There is no "forgot password". If a key is lost, the assets are gone; if a key is exposed, anyone can sign as that user. Test the integration around signing: that the app asks for a signature for exactly the action shown, that a rejected signature leaves no half-finished state, and that the app never logs or transmits a private key. Treat key handling as security-sensitive every time.

Oracles and testnets

A contract that needs real-world data — an exchange rate, a shipping status, a lab certificate — gets it from an oracle. The oracle is a trust boundary and a single point of failure: if it reports a wrong or stale value, the contract acts on garbage, permanently. Test the oracle path with bad data: a stale timestamp, an out-of-range value, a missing update. The contract should refuse to act on data it cannot trust.

Testnets are the safe place to do all of this. A testnet is a parallel chain that behaves like the real one but uses valueless tokens, so you can deploy, break and redeploy contracts without spending real money. Run your destructive and edge-case testing on a testnet or a local fork; never make a "let's see what happens" transaction on the live chain, because there is no undo.

Real-world NZ example — tokenised carbon-credit registry

Picture an NZ Emissions Trading Scheme registry that issues each carbon credit as a token on a ledger. A contract lets a holder retire a credit (use it against emissions, removing it from circulation) and lets others transfer credits. Test charter highlights:

Double-spend / double-retire: can the same credit be retired twice, or transferred and then retired by the old owner? Replay and reorder the transactions.
Finality before display: does the registry show "credit retired" only after enough confirmations, or the instant the transaction is included?
Off-chain ledger drift: does the registry's reporting database match the on-chain token supply exactly? A mismatch means the public total is wrong.
Oracle for the emissions figure: if the audited emissions value feeding the contract is stale or out of range, does the contract refuse it?
Immutability blast radius: if a credit is wrongly issued, what is the corrective transaction, and does the history still show the error (it must)?

Common mistakes

⚠ Testing only the happy path on a clean chain

A single successful run proves almost nothing. The defects live in replay, reorder, reentrancy and out-of-gas. Design transactions that fire twice, land out of order, and run at scale.

⚠ Trusting "confirmed" as final

Inclusion in a block is not finality. Test the window where a transaction can still be reorged out, and check the app does not show success too early.

⚠ Believing the off-chain database over the chain

The chain is the source of truth; the app's database is a cache. Always verify displayed state against on-chain state, or you will sign off on a screen that lies.

⚠ Assuming transaction order matches submission order

Ordering inside a block is not guaranteed. A contract that depends on "first one in wins" needs explicit testing for reordered and competing transactions.

⚠ Doing destructive testing on the live chain

There is no undo and gas costs real money. Run edge-case and break-it testing on a testnet or local fork, and keep keys out of logs and test reports.

4 Industry Reality

🏭 What you actually encounter on the job

Most blockchain projects are hybrid systems. The "blockchain" part is often one smart contract surrounded by a conventional web app, a relational database, and a REST API. You spend more time testing the off-chain glue than the contract itself — verifying that the database cache matches on-chain state and that events emitted by the contract actually reach the backend.
Testnets behave differently than mainnet. Finality times, gas prices, and node behaviour differ between testnets and production chains. Senior testers build separate test environments for testnet and run a smoke suite on mainnet (read-only queries only) to catch environment-specific surprises before go-live.
Smart contract audits happen, but they are not a substitute for functional testing. Security auditors look for reentrancy, integer overflow, and access-control bugs. They do not cover business logic, UI drift, or oracle trust boundaries. You still need to test those regardless of whether an audit was done — and in NZ fintech and carbon-credit projects, the audit is often the compliance checkbox, not the full QA sign-off.
Immutability creates pressure to get it right on first deploy. Unlike a normal release where a bad migration can be rolled back, a deployed contract is permanent. Teams often do parallel "shadow" deployments — deploy to testnet, run the full regression suite, then deploy to mainnet with no changes. Your test sign-off carries more weight than in conventional projects.
Wallets and key management are nightmare UX territory. In practice you will encounter users who lose their seed phrases, teams who store private keys in environment variables, and integrations where "sign this transaction" dialogs show garbled hex instead of plain-English descriptions. Treat key handling as both a security concern and a usability one, and always include non-technical stakeholders in wallet UX reviews.

5 When to Use It — and When Not To

⚡ Decision guide

✓ Use it when

The system includes smart contracts, token transfers, or on-chain state that can move real value or create permanent records
Your product integrates with an oracle — any external data feed that drives contract decisions needs trust-boundary testing
The business logic depends on transaction ordering or "exactly once" semantics (payments, credit retirements, ownership transfers)
You are testing a NZ Emissions Trading Scheme, provenance ledger, or tokenised-asset product where an immutable wrong record has regulatory or financial consequences
The app has a finality gap — any period between "transaction submitted" and "transaction confirmed" where UI state might mislead users

✗ Skip it when

Your system only reads from a public blockchain but never writes to it — standard API testing covers the read path
The "blockchain" is a private internal ledger with a database admin who can correct records — it lacks the immutability and finality properties that make blockchain testing distinct
The project uses blockchain as a buzzword but the actual data store is a conventional database with a blockchain branding layer
You are under time pressure and the team has not yet written any smart contracts — apply boundary value analysis and state-transition testing first, then layer in blockchain-specific techniques when the contracts are ready to test
The contract is entirely covered by a formal verification tool that has already proved the logic — focus your effort on the off-chain integration instead

Context guide

How the right level of blockchain testing effort changes based on project context.

Context	Priority	Why
NZ Emissions Trading Scheme (ETS) carbon-credit registry — on-chain token retirement	Essential	Immutable records with regulatory consequences. A double-retire or off-chain drift error cannot be rolled back and has direct compliance impact under the Climate Change Response Act.
NZ fintech tokenised payments or trade-finance documents (e.g. Harbour Bank, a bank-backed pilot)	Essential	Real money moves permanently. Replay and finality bugs result in double payments or phantom credits; immutability means there is no rollback path.
Primary provenance ledger for export products (e.g. mānuka honey, timber, seafood) where the chain record triggers third-party payments	High use	Oracle trust boundaries and ordering tests are critical — a wrong or stale data feed causes the contract to permanently misfire payments. Finality handling matters when buyers rely on on-chain confirmation before releasing funds.
Hybrid DeFi / Web3 consumer app with an off-chain backend (e.g. an NZ crypto exchange or wallet integration)	High use	Off-chain/on-chain state drift is the primary failure mode in these products. The Postgres or Mongo cache lags under load and after reorgs; reconciliation and event-listener stress testing are the highest-value activities.
Internal private ledger at a government agency (e.g. LandNZ property title experiment) where an admin can correct records	Medium	If records are correctable and finality is synchronous, standard integration testing covers most of the risk. Apply blockchain-specific techniques only to the properties that are genuinely append-only.
Read-only blockchain integration — app queries public chain data but never writes (e.g. displaying TransitNZ vehicle-title chain status)	Low	No smart-contract logic to exploit, no immutable writes to get wrong. Standard API and integration testing covers the read path; blockchain-specific techniques add little value.

Trade-offs

What you gain and what you give up when you choose blockchain testing.

Advantage	Disadvantage	Use instead when…
Surfaces failure modes that are invisible to conventional testing — replay, ordering, finality gaps, and oracle manipulation all require blockchain-specific test design to trigger.	Steep learning curve. Testers need to understand ledger mechanics, testnet tooling, and on-chain query methods before they can write meaningful test cases — skills most QA teams do not arrive with.	The system only reads from the chain. Standard API testing covers the read path without requiring blockchain expertise.
Validates the source of truth directly. Verifying state against the chain rather than the app database catches off-chain drift bugs that no amount of UI or API testing would reveal.	Slow feedback loops on real networks. Waiting for block confirmations on a public testnet means test cycles can take minutes instead of milliseconds; local chain forks (Hardhat, Anvil) partly mitigate this but add tooling complexity.	The "blockchain" is an internal ledger with a DB admin who can correct records. The immutability and finality properties that justify this technique do not apply.
Forces teams to confront immutability risk before go-live. Defining the compensating-transaction playbook during testing — rather than after the first production error — dramatically reduces incident response time on a ledger where nothing can be deleted.	Can be misapplied to non-ledger systems. When a project uses "blockchain" as a marketing term but the underlying store is a conventional database, blockchain-specific testing wastes time and can crowd out more relevant techniques.	The project has not yet deployed any smart contracts. Apply boundary value analysis and state-transition testing first; layer in blockchain-specific techniques once contracts exist to test.
Provides stronger sign-off confidence on high-stakes releases. Because deployed contracts cannot be patched, a test suite that deliberately attacks replay, reordering, and oracle boundaries gives stakeholders and auditors more meaningful assurance than a happy-path pass.	Gas costs and testnet faucet limits constrain test volume. Running large-scale loops or stress tests consumes testnet tokens that must be sourced from faucets; budgeting and tooling for this is an extra overhead that conventional testing never requires.	A formal verification tool has already proved the contract logic. Redirect effort to the off-chain integration and the event-listener pipeline, which formal verification cannot cover.

Enterprise reality

How blockchain testing changes at 200–300-developer scale in NZ

Smart-contract unit tests and on-chain reconciliation checks move into CI/CD pipelines using Hardhat or Foundry. At Harbour Bank's blockchain-enabled trade-finance pilots, every contract change must pass a replay and double-spend suite before a pull request can merge — a manual testnet run is no longer acceptable as the sole gate because releases happen fortnightly across multiple squads simultaneously.
Compliance coverage expands significantly. The Privacy Act 2020 requires you to demonstrate that personal data never lands on an immutable public chain; NZISM controls apply to private-key storage and HSM integration; and PCI DSS scope triggers the moment a payment token or wallet address appears in the transaction flow. At this scale those checks become formal test suites with sign-off artefacts, not informal notes in a test charter.
Tooling standardises around shared off-chain monitoring: a centralised reconciliation service continuously diffs the off-chain database against on-chain state and pages on-call engineers when drift exceeds a threshold. At a 200-person engineering org you cannot rely on each squad running ad-hoc reconciliation checks — drift detection has to be a live, automated service with alerting, because a single event-listener outage can corrupt the cache across every squad's product simultaneously.
Cross-squad coordination becomes the dominant challenge. With 10 or more squads sharing a ledger, a contract upgrade by one squad can break the event-listener assumptions of three others — and because there is no rollback on an immutable chain, the error is permanent. Enterprise NZ projects handle this with a shared contract-interface test suite (consumer-driven contract tests) and a release-train model where no squad deploys a contract change without running the full cross-squad regression first.

◆ What I would do

Professional judgment — when to reach for blockchain testing, when to skip it, and what to watch for.

Scenario 1 — Benefits NZ digital-identity credential as an on-chain verifiable claim

Situation

Benefits NZ is piloting a verifiable-credential scheme where benefit eligibility decisions are recorded as on-chain claims. A claimant's credential is issued once and cited in future service-access checks. The dev team says the contract passed a security audit and is read-only after issuance — "so there is nothing to test".

I would

Push back on "read-only means nothing to test." The issuance transaction is a write — and because credentials are immutable, a wrong issuance (wrong NHI, wrong eligibility flag, wrong expiry) has no delete path. I would design tests for the issuance boundary conditions (invalid NHI format, zero-duration credential, duplicate issuance for the same claimant), verify the off-chain directory that caches credential status matches on-chain truth after each issuance, and confirm the app only presents a credential as valid after the required confirmation threshold — not on transaction inclusion. I would also test what the consuming services do when the oracle feeding the contract's eligibility-rules feed is stale: they must refuse to issue rather than issue on bad data. The audit covers vulnerability classes; it does not cover any of this.

Scenario 2 — Pacific Air loyalty points migrating to a private permissioned ledger

Situation

Pacific Air is migrating its Airpoints programme onto a permissioned Hyperledger Fabric network. The platform team argues that because Fabric provides deterministic finality and the organisation controls all the nodes, blockchain-specific testing is unnecessary — "it is just a database with audit logging".

I would

Partially agree, and be precise about which techniques apply. If Fabric is configured for immediate, synchronous finality with no reorg risk, and if the organisation can correct records via an endorsed admin transaction, then replay testing and probabilistic finality testing add little value — the platform team is right for those techniques. However, the off-chain/on-chain drift concern absolutely still applies: the Airpoints app almost certainly maintains a relational cache for dashboard performance, and that cache can lag or misreport under load. I would run reconciliation checks between the app database and the ledger under realistic transaction volumes. I would also verify that the chaincode (Fabric's equivalent of a smart contract) correctly handles double-spend attempts — even on a permissioned network, a retry storm during a mobile outage can produce duplicate redemption requests, and the chaincode must reject the second one.

Scenario 3 — Spark digital-infrastructure subsidiary launching an NFT-based event-ticketing product

Situation

A Spark subsidiary is launching an NFT-based ticket platform. Each ticket is a token; transfer is peer-to-peer. Go-live is in five weeks. The team has tested the minting flow, transfer, and redemption on a happy path against Polygon's Mumbai testnet. The product manager is satisfied and wants to close the test phase.

I would

Not sign off. Five specific gaps remain open: (1) double-redemption — can the same ticket NFT be scanned twice at the gate? The contract needs a "redeemed" flag that is set atomically; test that the second scan is rejected. (2) Transfer-then-redeem race — one person transfers the ticket while a gate scanner is processing it; which owner does the contract honour? (3) Off-chain ticket cache — the event-gate app reads from a backend database, not the chain, for scan speed. Does that cache catch a last-second transfer? Test under concurrent load. (4) Polygon gas-price spikes — at a sellout event, hundreds of attendees transacting simultaneously can push gas prices high enough that legitimate transfers stall; confirm the UX communicates pending state honestly rather than showing failure. (5) Finality display — Polygon uses probabilistic finality; confirm the dashboard only shows a ticket as "transferred" after sufficient confirmations, not on submission. I would block the PM sign-off until these five are tested on the testnet, and I would document the compensating-transaction procedure for a wrongly issued ticket, because once minted it cannot be deleted.

The bottom line: The audit certificate and the happy-path green are not your sign-off. On an immutable ledger, your sign-off is the only thing standing between the team and a production bug that cannot be rolled back — so design tests that attack replay, ordering, finality gaps, and off-chain drift before you give it.

6 Best Practices

✓ What experienced testers do

✓ Always verify against on-chain state, never the app screen. Query the contract directly or use a block explorer to confirm the actual on-chain value matches what the UI shows. The screen is a cache.
✓ Design transactions in pairs: the action, then a repeat of the same action. If the contract does not guard against replay, two identical signed transactions will both succeed. This is the single most common missed test case on real projects.
✓ Test at scale before signing off on loops. A function that iterates over a list passes with 10 records and fails with 10,000 because it runs out of gas. Run your loops with production-realistic data sizes on a testnet.
✓ Feed the oracle bad data deliberately. Use API mocking to inject a stale timestamp, a null value, and an absurdly out-of-range result. The contract must refuse to act on untrusted oracle data; if it does not, every subsequent decision it makes is permanently wrong.
✓ Confirm the finality threshold with the dev team, then test across it. Ask: after how many confirmations does the chain treat a transaction as final? Then check the app does not show success before that threshold is met, and simulate a reorg to confirm it handles the reversal gracefully.
✓ Reconcile off-chain and on-chain state as a regression check. After every test run, diff the app database against the chain. Drift that builds up silently is the most common cause of user-facing data bugs in blockchain products.
✓ Keep private keys out of test reports, logs, and screenshots. Wallet addresses are pseudonymous but keys are secret. If a key appears in a test artefact, treat it as a security incident and rotate it immediately.
✓ Document the compensating-transaction playbook before go-live. Because nothing can be deleted, you need a defined procedure for every category of bad record: what the correcting transaction looks like, who authorises it, and how the audit trail is maintained. As the tester, verifying that playbook works is your responsibility.
✓ Use a local chain fork (e.g. Hardhat, Anvil) for deterministic destructive testing. A local fork lets you mine blocks on demand, set block timestamps, and simulate reorgs without testnet faucet limits or block time delays — dramatically faster feedback than waiting for real confirmations.
✓ Write your oracle test cases before reading the contract code. Starting from the business rule ("the contract should only act on fresh, in-range data from a trusted source") forces you to think about trust boundaries rather than implementation details, and you will find more meaningful defects.

7 Common Misconceptions

❌ Myth: "The blockchain is secure, so we do not need to test the smart contract logic."

Reality: The blockchain secures the data after it is written — it cannot protect against logic bugs in the contract itself. A reentrancy flaw, an off-by-one on a boundary, or a missing replay guard will execute perfectly, permanently, and at the cost of real money. Immutability makes bad logic worse, not better: the bug stays in the ledger forever. Smart-contract logic needs at least as much functional testing as any critical business rule.

❌ Myth: "If the transaction is in the block, it is final."

Reality: "Included" and "final" are different states. A transaction can be included in a block and then dropped if that block loses out to a competing chain during a reorg. On Ethereum mainnet, most teams treat a transaction as final after 12 confirmations (~2.5 minutes); on faster chains the number differs. An app that shows "Payment complete" on inclusion can be showing a result that silently disappears. Testers must check the confirmation threshold and simulate a reorg to verify the app handles it correctly.

❌ Myth: "We do not need a tester on this project because a security auditor reviewed the contract."

Reality: Security auditors focus on vulnerability classes: reentrancy, access control, integer overflow, and known attack patterns. They do not test business logic, oracle trust boundaries, UI/chain state drift, finality handling, or the off-chain integration. In NZ projects involving ETS registries, provenance ledgers, or regulated financial products, both a security audit and functional QA are required — they answer different questions and neither substitutes for the other.

8 Now You Try

Three graded exercises — spot, fix, then build. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot: name the blockchain-specific risks

A smart contract on the mānuka-honey provenance ledger releases payment to a producer the first time an "exported" record is written for a shipment. List the blockchain-specific failure modes a tester should probe (think replay, ordering, finality, immutability, off-chain drift) and, for each, the test you would run.

Show model answer

Blockchain-specific risks for "pay on first exported record":

- Replay / double-spend — the same "exported" transaction is resubmitted (a network retry) and pays the producer twice. Test: fire the identical transaction twice and confirm the contract pays exactly once (it needs a per-shipment guard).
- Transaction ordering — "exported" lands in a block before "packed" because order is not guaranteed. Test: submit the records out of order and confirm the contract refuses to pay without the required prior states.
- Finality vs inclusion — payment is shown as complete the moment the transaction is included, then the block is reorged out and the payment never happened. Test: simulate a reorg and confirm the app only reports success after enough confirmations.
- Immutability blast radius — a wrong payment cannot be deleted. Test: confirm the only correction is a new compensating transaction and that the erroneous record stays visible in history.
- Off-chain drift — the producer dashboard reads from the app database, not the chain. Test: confirm displayed "paid" status matches the actual on-chain balance.

The core insight: a normal duplicate-insert would bounce on a unique constraint; on a ledger both signatures are valid, so the guard has to be in the contract logic, and the tester has to attack it.

🔧 Exercise 2 of 3 — Fix: repair a flawed test charter

A tester wrote the charter below for a contract that lets a holder retire a tokenised carbon credit. It is weak: it only checks a happy path, trusts the app's screen, and never touches the ledger's special properties. Rewrite it into a stronger charter.

Flawed charter:
1. Log in and select a credit.
2. Click Retire.
3. Confirm the screen says "Retired".
4. Done — the credit is retired.

Rewrite as a stronger charter:

Show model answer

Stronger charter for retiring a carbon credit:

1. Retire the credit, then verify the on-chain token state directly (not the screen): the credit is marked retired and removed from circulating supply.
2. Try to retire the same credit a second time, and try to transfer it after retiring — both must be rejected. (Double-retire / spend-after-retire.)
3. Confirm the app shows "Retired" only after enough confirmations for finality; simulate a reorg and confirm it does not report success on a transaction that gets dropped.
4. Reconcile the off-chain reporting database against the on-chain supply — they must match exactly.
5. Confirm a wrongly retired credit cannot be deleted; the only fix is a documented compensating transaction, and the error stays in history.

What was wrong with the original:
- Happy path only: no replay, no double-retire, no out-of-order, no scale.
- Trusted the screen: it confirmed the app's own message instead of the on-chain truth — the screen is a cache and can be stale or wrong after a reorg.
- Ignored finality: "Retired" the instant the click returns is not the same as finalised on-chain.
- Ignored immutability: never considered the blast radius of a wrong retirement, which cannot be undone.

🏗️ Exercise 3 of 3 — Build: design oracle and boundary tests

A contract releases an export payment only when an oracle reports a lab pass and the shipment weight is within 1 kg to 500 kg (inclusive). Design test cases covering (a) the weight boundaries with 2-value BVA and (b) the oracle trust boundary — what bad oracle data must the contract refuse?

Show model answer

Weight boundary tests (range 1–500 kg inclusive, 2-value BVA):
- 0 kg — Reject (just below lower boundary; also a zero/empty-shipment edge)
- 1 kg — Accept (lower boundary, inclusive)
- 500 kg — Accept (upper boundary, inclusive)
- 501 kg — Reject (just above upper boundary)
(Bonus: also probe a negative weight and an absurd value to confirm input validation.)

Oracle trust-boundary tests — the contract must REFUSE to release payment on:
- Stale data — the lab-pass timestamp is older than the allowed freshness window. Reject (do not act on stale truth).
- Out-of-range / impossible value — the oracle reports a weight or result outside any sane bound, or a malformed value. Reject.
- Missing / no update — the oracle has not reported for this shipment at all. The contract must wait, not assume a pass.

Senior note: the oracle is a single point of trust. Even with perfect weight boundaries, a bad oracle feed makes the contract act on garbage permanently — so refusing untrusted data is as important as the BVA on the weight.

Why teams fail here

Testing only the happy path on a clean chain — replay, reordering, reentrancy, and out-of-gas are where the real defects live, and a single successful run never finds them.
Trusting the app screen instead of the chain — the UI reads from a local cache that can be stale or wrong after a reorg; the chain is always the source of truth, and testers must verify against it directly.
Treating a security audit as full QA coverage — auditors look for known vulnerability classes; they do not cover business logic, oracle trust boundaries, finality handling, or off-chain integration, all of which can harbour production-breaking bugs.
Assuming submission order equals block order — validators and miners decide transaction ordering within a block by gas price and other factors, not arrival time; a contract that relies on first-come-first-served semantics must be tested with deliberately reordered transactions.

Key takeaway

On a blockchain, immutability makes every bug permanent — so test replay, reordering, finality, and off-chain drift before you deploy, because there is no rollback after the fact.

How this has changed

The field moved. Here is how Blockchain Testing evolved from its origins to current practice.

2009–2015

Bitcoin and early Ethereum. Testing is done by core developers and cryptography researchers. No established QA practice exists — the decentralised, immutable nature of blockchains makes conventional testing approaches inapplicable.

2017

ICO boom drives demand for smart contract audits. High-profile hacks (The DAO, Parity multisig) demonstrate that unaudited smart contracts represent enormous financial risk. Security-focused blockchain testing firms emerge.

2018–2020

Testing frameworks for Solidity (Truffle, Hardhat, Brownie) emerge. Unit testing smart contracts becomes standard practice. Formal verification tools (Certora, MythX) bring mathematical proof to critical contract logic.

2021–23

Layer 2 scaling (Optimism, Arbitrum) and DeFi complexity increase the attack surface. Flash loan attacks, oracle manipulation, and cross-chain bridge exploits create new test categories. Fuzz testing smart contracts with Echidna and Foundry becomes mainstream in Web3 teams.

Now

AI-powered smart contract auditing tools can identify common vulnerability patterns automatically. Enterprise blockchain (Hyperledger) testing follows traditional distributed systems patterns. NZ financial services firms exploring blockchain require Privacy Act 2020 and AML/CFT test coverage on top of protocol-level testing.

Self-Check

Click each question to reveal the answer.

Q1: Why is a single successful run a weak test for a smart contract?

Because the defects that matter on a ledger are not on the happy path. They appear when the same transaction runs twice (replay), when transactions land out of order, when a contract calls out and is re-entered, or when a loop runs out of gas at scale. A clean single run never exercises any of those, so it tells you almost nothing about the real risk.

Q2: What is the difference between a transaction being "included" and being "final", and why does it matter to a tester?

Included means it is in a block; final means that block can no longer be replaced. Between the two, a reorg can drop the transaction so the action never really happened. It matters because an app that shows "success" on inclusion can be showing a result that quietly disappears — so the tester must check the app waits for enough confirmations before reporting success.

Q3: A user dashboard says a payment was made, but the on-chain balance has not changed. Which one do you trust, and what does that tell you?

Trust the chain. The chain is the source of truth; the off-chain database is a cache that can drift. The mismatch is itself the bug — the app is showing state that is not real — and it means your verification must always read on-chain state, never just the app's own screen or database.

Q4: What is reentrancy, and how would you test for it without writing the contract yourself?

Reentrancy is when a contract calls out to another address before it finishes updating its own state, and the called code calls back in and acts on the stale state — classically to drain funds. As a tester you simulate a malicious recipient that re-enters the function during the call, and confirm the contract has updated its balances before making the external call (or otherwise guards against re-entry).

Q5: Why do destructive and edge-case tests belong on a testnet, and what does immutability mean for a wrong record?

Because the live chain has no undo and every transaction costs real money — a "let's see what happens" call could move real value permanently. A testnet behaves like the real chain with valueless tokens, so you can break and redeploy freely. Immutability means a wrong record cannot be edited or deleted; the only fix is a new compensating transaction, and the original error stays visible in history forever.

Q: Your team is testing an NZ Emissions Trading Scheme registry that issues carbon credits as on-chain tokens. A release is due in two weeks and the contract has passed a security audit. What testing would you still prioritise, and why?

A security audit checks vulnerability classes (reentrancy, access control, overflow) but does not test business logic, finality handling, oracle trust boundaries, or off-chain/on-chain state drift. For an ETS registry you would still run replay and double-retire tests, confirm the UI only shows "Retired" after the required confirmation threshold, feed the emissions oracle with stale and out-of-range values, and reconcile the reporting database against on-chain token supply. The audit is a compliance checkpoint, not a substitute for functional QA — and on an immutable ledger, a missed business-logic bug cannot be rolled back.

Q: What is the key difference between blockchain testing and idempotency testing, and when would you apply each?

Idempotency testing checks that repeating an operation produces the same result as running it once — it is a general property of APIs and services that can be applied anywhere. Blockchain testing focuses specifically on a distributed, append-only ledger where you cannot undo a write, finality is probabilistic, and ordering within a block is not guaranteed. The replay/double-spend concern in blockchain testing resembles idempotency testing, but blockchain adds unique pressures: real money is moved permanently, the contract logic itself must enforce the guard (not just the API layer), and the "source of truth" is always the chain rather than a server-side database. Apply idempotency testing broadly to any retryable API; add blockchain-specific techniques when the data store is an immutable distributed ledger with smart-contract logic.

Q: A developer says, "We do not need to test transaction ordering because our backend submits transactions one at a time in sequence." What is wrong with this and how do you respond?

The order transactions are submitted is not the order they land in a block. Miners or validators decide ordering within a block based on factors like gas price, not arrival time — so two transactions submitted sequentially can be reversed, or a third-party transaction can be inserted between them. The backend's submission sequence gives no guarantee on the contract side. You would respond by asking the developer to show where the contract enforces the required ordering, then design test cases that replay the transactions in a different sequence to confirm the contract rejects the invalid order — rather than relying on submission timing that the chain does not honour.

Q: A NZ fintech team building a KiwiSaver top-up feature on a private internal ledger argues they need blockchain-specific testing because "it is still a ledger." When is this argument valid, and when is it not?

The argument is valid only if the private ledger genuinely exhibits the properties that make blockchain testing distinct: append-only immutability (no record can be corrected in place), probabilistic finality (a "confirmed" state can still be reversed), and ordering that is not controlled by the submitting party. If a database administrator can correct records, if commits are synchronous and final, and if the team controls transaction ordering, then it is a conventional database with ledger-style audit logging — and standard testing techniques cover it. For a KiwiSaver top-up on most private ledger platforms, the team almost certainly retains administrative control, so the "skip it" column of the decision guide applies and blockchain-specific replay/finality tests are not warranted.

Interview Prep

"How would you approach testing a smart contract if you are not a Solidity developer?"

I treat the contract as a deterministic spec to verify, not code to write. I read what states and transitions it claims to support, then design transactions that probe the edges: boundary values on amounts and limits, equivalence partitions on inputs, and the blockchain-specific moves — calling the same function twice (replay), calling it out of order, calling it with zero or maximum values, and running it at scale to catch out-of-gas. I run all of it on a testnet, and I verify results against on-chain state, not the app's screen.

"A team says 'it is on the blockchain so it must be correct and final'. How do you respond?"

I would gently separate two things. The ledger guarantees that what is written cannot be tampered with — but it does not guarantee that what was written is correct, that it ran exactly once, or that it has reached finality yet. Immutability actually raises the stakes: a wrong record is permanent. So I would still test the contract logic, the replay and ordering behaviour, the confirmation/finality handling, and the oracle inputs — because "on the chain" only secures the data after the logic has already decided what to write.

"What is the biggest source of user-facing bugs in a blockchain product, in your experience?"

Drift between off-chain and on-chain state. The app keeps a fast local copy of what is on the chain so the UI feels responsive, and that copy goes stale — after a reorg, a missed event, or a slow sync. The user sees "paid" or "retired" when the chain says otherwise. My rule is that the chain is the source of truth and the local database is a cache, so I always reconcile the displayed state against the actual on-chain state.

Smart-contract input testing leans heavily on Boundary Value Analysis and Equivalence Partitioning for amounts and limits, and on Decision Table Testing for branching contract logic.

The states a credit or shipment moves through map cleanly onto State Transition Testing, and reorg/fork resilience is a natural fit for Chaos Engineering.

Key handling and reentrancy are squarely Security Testing concerns, and the oracle path is best probed with API Mocking & Stubbing to feed deliberately bad data.

Blockchain Testing

1 The Hook

2 The Rule

3 The Analogy

What it is

Smart-contract testing

Immutability and finality

Consensus, forks and ordering

On-chain vs off-chain state, and wallets

Oracles and testnets

Common mistakes

4 Industry Reality

5 When to Use It — and When Not To

✓ Use it when

✗ Skip it when

Context guide

Trade-offs

◆ What I would do

6 Best Practices

7 Common Misconceptions

8 Now You Try

How this has changed

Self-Check

Interview Prep

Prerequisites

Related Techniques

What to Learn Next

Also in Bootcamp

Blockchain Testing

1 The Hook

2 The Rule

3 The Analogy

What it is

Smart-contract testing

Immutability and finality

Consensus, forks and ordering

On-chain vs off-chain state, and wallets

Oracles and testnets

Common mistakes

4 Industry Reality

5 When to Use It — and When Not To

✓ Use it when

✗ Skip it when

Context guide

Trade-offs

◆ What I would do

6 Best Practices

7 Common Misconceptions

8 Now You Try

How this has changed

Related techniques

Self-Check

Interview Prep

Related techniques

Prerequisites

Related Techniques

What to Learn Next

Also in Bootcamp