Testing Oracles & Heuristics
Every test needs an oracle — a way to decide whether the result is correct. Most testers use oracles without naming them. BBST makes oracle choice explicit, because hidden oracle assumptions are where tests silently lie.
1 The Hook
A QA team at a Wellington fintech company ran six hundred automated regression tests against a new interest-calculation engine. All six hundred passed. Two weeks after release, a customer complained that her loan statement was wrong. The developer pulled the logs, ran the calculation by hand, and confirmed: the engine was producing the wrong result — and had been for two weeks.
How did six hundred tests miss it? The oracle was wrong. Every automated test checked that the new engine returned the same value as the old engine. The old engine had the same bug. The tests weren’t lying — they were faithfully checking that the new system was as wrong as the old one.
That is the oracle problem. A test can only tell you that output matched your oracle. If your oracle is wrong — incomplete, outdated, or simply the wrong mechanism for the question you are asking — your tests pass even when the system fails. BBST makes oracle choice explicit, because an implicit oracle is an unaudited assumption hiding inside every green tick.
2 The Rule
An oracle is any mechanism or principle by which you recognise a problem. Oracles can be explicit (a spec says “field must accept values 1–100”), heuristic (“this output looks wrong”), or statistical (“this value is 3 standard deviations from the norm”). Every test implicitly uses at least one oracle — BBST teaches you to name it, challenge it, and choose it deliberately.
3 The Analogy
A food safety inspector, not a checklist robot.
A food safety inspector doesn’t only follow a checklist — they know what “fresh” smells like, what “safe temperature” feels like, and what suspicious discolouration signals. Some of their judgements are explicit rules written in the Food Act; others are heuristics built from years of inspecting kitchens. Their accumulated knowledge IS the oracle. Remove their expertise and replace them with a pure checklist-follower, and the checklist-follower passes the kitchen that smells wrong because the smell isn’t on the list.
A software tester without oracle awareness is the checklist-follower: they tick off what the spec says and miss what the spec doesn’t cover. BBST helps testers build and articulate that same systematic intuition — to know which oracles they are using, where each one is incomplete, and when to add a new one.
Common Mistake vs What Works
Treating the requirements document as the complete oracle: “If it matches the spec, it passes.” Real systems are used by humans in contexts the spec never described. A NZ Inland Revenue income verification form might pass every requirement — and still produce a misleading result for a person with multiple income sources, because the spec author never modelled that scenario. An oracle limited to the spec will never catch it.
Name the oracles before testing begins. For every test session, ask: “What am I comparing output against? A spec? A prior version? Domain knowledge? User expectations?” Then ask: “Where does each oracle break down?” A senior tester at an NZ government agency runs SFDIPOT across a new feature to surface oracle gaps before a single test is written — finding the questions the spec forgot to answer while there is still time to answer them.
Every test encodes oracle assumptions. When tests pass silently on a broken system, the most common culprit is not a missing test — it is a wrong or incomplete oracle that nobody named. BBST oracle analysis forces those assumptions into the open so they can be challenged, augmented, and documented. Use SFDIPOT to generate oracle questions systematically; use FEW HICCUPPS to categorise the types of oracles available to you.
The most dangerous oracle failure I see on NZ projects is the “prior version” oracle applied without scrutiny. Teams migrate from a legacy system and write automated tests that assert the new system produces the same output as the old one. This catches regressions perfectly — but it also propagates every bug in the legacy system into the new one, and your test suite tells you everything is fine. Before accepting a prior-version oracle, ask: “Do we actually trust the legacy system output for this calculation?” On an ACC payments migration I reviewed, the answer was no for three edge cases involving partial-week entitlements. The oracle was right for 97% of cases and catastrophically wrong for the 3% that mattered most.
What It Is
The oracle problem was articulated by James Bach and Michael Bolton as a core concept in BBST (Black Box Software Testing). The insight: testing requires not just running the software but knowing how to recognise a problem when you see one. That recognition mechanism is the oracle.
Oracles are not automatic. A test that compares actual output to expected output only works if the expected output is correct — which requires the oracle (the thing that generated the expected output) to be trustworthy for the question you are asking. Oracle failures are silent: the test passes, the system is broken, and nobody knows.
Types of Oracles
| Oracle type | How it works | Example | Weakness |
|---|---|---|---|
| Comparable products | Compare output to a similar system in the market | Check NZ tax calculation against another tax software package | The competitor may share the same bug; neither is authoritative |
| Prior versions | Assert new system matches old system output | New ACC claims portal vs legacy system for same inputs | Propagates legacy bugs; useless if legacy had known errors |
| Documented spec | Compare to formal requirements or contract | Field must accept values 1–100 per IRD specification | Specs are incomplete; they never cover every scenario |
| User expectations | Would a typical user find this result surprising or misleading? | Date field returns “29/02/2025” — users know 2025 has no Feb 29 | Subjective; hard to formalise; requires domain knowledge |
| Domain knowledge | Expert knowledge about what results are plausible | A GST refund greater than GST paid should not be possible | Only as good as the expert; siloed knowledge creates blind spots |
| Statistical data | Flag outliers against historical norms | Processing time is 3 standard deviations above last week’s mean | Needs baseline data; outliers can be legitimate |
| Heuristic reasoning | “This looks wrong” — accumulated intuition | Amount displayed as negative when a positive is expected | Hard to document; fails when the tester’s intuition is miscalibrated |
SFDIPOT — Systematic Oracle Generation
SFDIPOT (also written SFDPOT) is a heuristic from the Heuristic Test Strategy Model. It gives you seven lenses through which to ask “what could go wrong here?” — and therefore, “what oracle do I need for this dimension?”
| Letter | Dimension | Oracle question |
|---|---|---|
| S | Structure | Is the structure of the output correct? (HTML, JSON, file format, database schema) |
| F | Function | Does the system do what it is supposed to do? (spec oracle) |
| D | Data | Does the system handle data correctly? (boundary values, types, encoding, null, locale) |
| I | Interface | Does the system communicate correctly with users and other systems? (API contracts, UI feedback) |
| P | Platform | Does the system behave correctly across browsers, devices, OS, environments? |
| O | Operations | Can operators use, deploy, monitor, and maintain it? (logging, alerting, config) |
| T | Time | Does behaviour depend on time? (dates, timezones, NZ daylight saving, session expiry, race conditions) |
FEW HICCUPPS — Oracle Heuristics
FEW HICCUPPS (James Bach and Michael Bolton) is a catalogue of oracle types you can draw on. When you are stuck on what oracle to use, run through this list and ask which ones apply to your context.
| Letter | Oracle | How you use it |
|---|---|---|
| F | Familiar | Does this match how similar systems behave? (pattern recognition) |
| E | Explainability | Can you explain the output? If not, it may be a bug even if it passes other oracles |
| W | World | Does the output match real-world facts? (e.g. a NZ postcode in Auckland should not map to Dunedin) |
| H | History | Does the output match prior outputs for the same inputs? (regression oracle) |
| I | Image | Does the output match what the organisation wants to project? (brand, tone, legal exposure) |
| C | Comparable products | Does the output match what competitors or equivalent systems produce? |
| C | Claims | Does the output match what the documentation, release notes, or marketing claims? |
| U | User expectations | Would users be surprised, confused, or misled by this output? |
| P | Product | Does the output match the product’s own internal consistency? (e.g. totals add up) |
| P | Purpose | Does the output serve the purpose the product is intended for? |
| S | Statutes | Does the output comply with applicable law? (Privacy Act, Consumer Guarantees Act, Health and Safety at Work Act) |
5 Worked Examples — NZ Context
Example 1: ACC Online Injury Claims Form
You are testing ACC’s online injury claims form before a major release. The question BBST oracle theory forces you to ask upfront is: what are my oracles?
| Oracle type | What it covers | Where it breaks down |
|---|---|---|
| Documented spec | Mandatory fields, character limits, accepted date formats, ACC eligibility criteria per Accident Compensation Act 2001 | Spec doesn’t describe what happens when a claimant has multiple injuries from the same incident; edge cases in rural address formats |
| Prior version | New form should accept the same valid inputs and reject the same invalid inputs as the previous form | Prior form had known usability issues with occupation codes; a prior-version oracle propagates those problems |
| Comparable government forms | Similar government service portals (MSD, IRD myIR) set user expectations for layout, error message style, and confirmation behaviour | Government portals vary enormously in quality; “it matches MSD” is not a safety net |
| User expectations (Privacy Act) | Claimants expect their injury details are not visible to other family members sharing a login; Privacy Act 2020 creates a legal expectation of data isolation | Spec may not have addressed shared-login scenarios; needs explicit oracle |
| Domain knowledge (ACC eligibility) | Tester who knows ACC eligibility rules can flag when the form’s logic accepts a claim that should be ineligible (e.g. a work injury where the employer field is blank) | Eligibility rules are complex; domain knowledge oracle requires a subject-matter expert to be part of testing |
Example 2: SFDIPOT Applied to a KiwiSaver Withdrawal Application
A bank’s digital team is releasing a new KiwiSaver hardship withdrawal application. Run SFDIPOT to generate oracle questions before a single test is written.
| Dimension | Oracle question | Oracle needed |
|---|---|---|
| S Structure | Does the submitted application produce a correctly structured PDF and a correctly structured database record? | Spec oracle for PDF schema; data oracle for DB field types and constraints |
| F Function | Does the eligibility check correctly apply the KiwiSaver Act 2006 hardship criteria? Does the withdrawal amount calculation apply the correct rules? | Statute oracle (KiwiSaver Act); domain knowledge oracle (financial rules) |
| D Data | Does the form handle NZ dollar amounts with cents correctly? What happens with a balance of $0? A balance above $1,000,000? Non-numeric input? | Boundary oracle (spec); world oracle (amounts must be positive and non-zero) |
| I Interface | Does the confirmation screen accurately summarise the application before submission? Does the API to the fund manager send the correct payload? | Spec oracle; product oracle (totals must match what was entered) |
| P Platform | Does the form work correctly on Safari mobile (common in NZ)? Does it work on older Android devices? | Comparable products oracle; user expectations oracle |
| O Operations | If the fund manager API is unavailable, does the form fail gracefully and preserve the application? Are failed submissions logged? | User expectations oracle; spec oracle for error handling |
| T Time | Does the application handle NZ daylight saving transitions correctly when calculating submission timestamps? Does a session timeout mid-form lose the data? | World oracle (NZ timezone rules); user expectations oracle |
6 Industry Reality
- Oracle failures cause testing blind spots nobody admits to. When a release goes out and a bug surfaces that the test suite should have caught, the honest post-mortem usually reveals an oracle problem: the tests checked the wrong thing, compared against a faulty baseline, or didn’t cover the question that mattered. Teams rarely diagnose it this way because “our oracle was wrong” sounds embarrassingly fundamental.
- Automated tests encode the oracle assumptions of whoever wrote them. A thousand passing automated tests is a thousand assertions that output matches what one person expected. If that person misunderstood the domain, used the spec as the oracle for behaviour the spec never described, or copied expected values from a legacy system with known bugs, every test is a liability disguised as coverage.
- The most dangerous oracle is the unexamined one. On NZ government projects — IRD, MSD, Waka Kotahi — it is common to use a prior system as the oracle for migrated data and logic. This feels safe and rigorous. It is only rigorous if the prior system is actually trustworthy for the specific calculations being migrated. Verify the oracle before trusting it.
- FEW HICCUPPS is a conversation starter, not a checklist to tick. Senior testers use it in planning sessions to surface oracle questions with product owners and developers. “What does a user expect from this output? What do our claims say it does? Does it comply with the Privacy Act?” These conversations often reveal that nobody has agreed what success looks like — which is the real problem the test suite would have masked.
- Statistical oracles are underused in NZ QA teams. Performance baselines, throughput distributions, and error-rate trends are oracles that automated functional tests never check. A release that passes all functional tests but doubles average response time has a bug — but only a statistical oracle will catch it.
7 When to Use It — and When Not To
✓ Use oracle analysis when
- Requirements are ambiguous or incomplete — SFDIPOT surfaces the questions the spec forgot to answer before testing begins
- The system has no prior version to compare against — you must construct oracles from domain knowledge, user expectations, and statutes
- You are inheriting a legacy test suite — audit the oracles before trusting the coverage
- The domain is regulated (ACC, IRD, banking) — statute and domain oracles are not optional
- You are writing automated tests — explicitly name the oracle every assertion encodes so future maintainers understand what assumption is being checked
- A test suite is giving false confidence — investigate oracle quality before adding more tests
✗ Recalibrate when
- The domain is fully specified with no ambiguity and the spec is authoritative — spec-as-oracle is valid in this narrow case, but confirm the spec actually is complete
- You are spending more time debating oracles than testing — agree on a “good enough” oracle set and proceed; oracle theory is a tool, not a philosophy seminar
- The system is being retired — deep oracle analysis is wasted effort on software being decommissioned in weeks
- The risk is genuinely low — for internal tooling with a small user base and easy rollback, a simple heuristic oracle is proportionate
8 Best Practices
- ✓ Name your oracles before you write a test. For every test case, write one sentence: “I am asserting this output is correct because [oracle].” If you cannot complete that sentence, you do not have an oracle — you have a guess.
- ✓ Document oracle choices in your test plan. A test plan that says “we will compare against the prior version” is making an oracle commitment. Make it explicit so reviewers can challenge it. On regulated NZ projects, oracle documentation becomes part of the compliance evidence.
- ✓ Run SFDIPOT at the start of each new feature. Ten minutes with the development team walking through Structure, Function, Data, Interface, Platform, Operations, and Time surfaces assumptions nobody has yet made explicit — and those are exactly the assumptions that become oracle failures in production.
- ✓ Challenge prior-version oracles. Before accepting “it matches the old system” as proof of correctness, ask: “Was the old system correct for this input?” Get explicit confirmation from a domain expert. If the old system had known bugs, list them so the new system is not tested against them.
- ✓ Include statute oracles for regulated domains. In NZ, relevant statutes include the Privacy Act 2020, Consumer Guarantees Act 1993, Health and Safety at Work Act 2015, and domain-specific legislation (Accident Compensation Act, KiwiSaver Act, Income Tax Act). A test that passes the spec but violates the Privacy Act is a failed test.
- ✓ Work with product owners to define success criteria before testing. “What would make you confident this feature is correct?” is an oracle question. The answer should drive your oracle selection, not validate it after the fact.
- ✓ Use FEW HICCUPPS as a facilitation tool. In a requirements review or sprint planning session, walk through the mnemonic with stakeholders. Each letter is a prompt: “Do we know what users expect from this? Is there a comparable product? What claims are we making in the documentation?” Document the answers as oracle decisions.
- ✓ Flag oracle uncertainty in bug reports. When you report a bug, note the oracle you used to identify it. “This fails against the user-expectations oracle because a typical NZ user would interpret this as a confirmation, not a warning.” This makes the bug report more defensible and gives developers the context they need to evaluate severity.
9 Common Misconceptions
❌ Myth: “The spec IS the oracle — if it matches requirements, it passes.”
Reality: Specifications are always incomplete. They describe intended behaviour for anticipated scenarios; they do not describe every interaction a real user will have with a real system in a real context. A claims form that meets every requirement in the spec may still produce a misleading outcome for a claimant with an unusual employment history, because the spec author never modelled that scenario. The spec is one oracle. It is rarely a sufficient one.
❌ Myth: “Oracles are obvious — you just know whether the output is right.”
Reality: Most oracle assumptions are implicit, and implicit assumptions are the ones that fail silently. If you “just know” an output is right, you are using an unexamined heuristic oracle — which is fine when your domain knowledge is accurate and your intuition is calibrated, but is catastrophic when either is off. The point of naming oracles is not to make obvious things complicated; it is to surface the cases where your implicit oracle is wrong in ways you have not yet noticed.
❌ Myth: “Automated tests have perfect oracles — if the assertion passes, the software is correct.”
Reality: Automated tests encode the oracle assumptions of whoever wrote the assertion. An assertion that “output equals expected” is only correct if “expected” was correctly derived. If expected was copied from a legacy system with a calculation bug, the automated test faithfully asserts the wrong answer. If expected was generated by calling the same function being tested, the assertion is circular and catches nothing. The coverage number looks healthy; the oracle is broken.
❌ Myth: “Oracle analysis is only relevant for exploratory testing.”
Reality: Oracle analysis is most visible in exploratory testing because exploratory testers rely heavily on heuristic and domain-knowledge oracles. But oracle problems are equally common in scripted testing (spec incomplete) and automated testing (wrong expected values). Every test — scripted, exploratory, or automated — uses at least one oracle. The mistake is assuming scripted or automated tests are immune to oracle failures because they feel more rigorous.
10 Now You Try
Two graded exercises — oracle identification and oracle gap analysis. Write your answer, run it for AI feedback, then compare to the model answer.
You are testing a new NZ government portal that lets users apply for Working for Families tax credits. List at least four distinct oracle types you would use to test whether the eligibility calculation is correct. For each, name the oracle type (from FEW HICCUPPS or the oracle type table), what it covers, and one specific weakness.
Show model answer
Model oracle set for Working for Families eligibility: Oracle 1 - Statutes (FEW HICCUPPS: S) Covers: The Income Tax Act 2007 Part M defines WFF eligibility criteria: income thresholds, number of dependent children, residency requirements. The portal's calculation must match the statute, not just the spec. Weakness: Tax legislation is complex and amended regularly; the oracle requires a tax-law specialist to validate the mapping from statute to system rules. Oracle 2 - Comparable products Covers: IRD's own published WFF calculator (available on ird.govt.nz) can be used as a reference calculation for the same inputs. Outputs should match. Weakness: If IRD's published calculator has a known issue with certain income types (e.g. self-employment income with provisional tax), using it as an oracle propagates that issue. Oracle 3 - Domain knowledge Covers: A payroll or tax domain expert can identify inputs where the spec omits edge cases - e.g. shared care arrangements, overseas income, income from rental properties. The expert's knowledge of what the correct answer should be is the oracle. Weakness: Domain knowledge is tacit; if the expert is unavailable or wrong, the oracle fails silently. Oracle 4 - User expectations (FEW HICCUPPS: U) Covers: A typical NZ family applying for WFF would expect: the calculation to update in real time as they enter income, a breakdown of which credits they qualify for and why, and a clear explanation if they are ineligible. Weakness: Subjective; different users have different expectations. Requires user research or representative user testing to validate. Bonus: Statistical oracle - compare the distribution of approved amounts against prior year data for similar income bands. A sudden spike or drop in average credit amounts signals a calculation change worth investigating.
A team has written automated tests for a new NZ council rates payment portal. Their test plan states: “All tests use the prior version of the portal as the oracle. If the new portal produces the same output, the test passes.”
Run SFDIPOT on this oracle strategy. For each dimension, identify at least one question the prior-version oracle cannot answer — and suggest what oracle should fill that gap.
Show model answer
SFDIPOT oracle gap analysis for council rates payment portal: S - Structure: The prior system returned a flat HTML confirmation page; the new system must return a structured JSON receipt for the council's accounting integration. Prior-version oracle cannot validate the new JSON schema. Better oracle: API spec oracle - validate output against the agreed JSON schema for the accounting integration. F - Function: The prior system had a known bug where partial-payment plans were calculated incorrectly for properties with arrears. The prior-version oracle propagates this bug. Better oracle: Domain oracle - engage a rates officer to verify the calculation against the Local Government (Rating) Act 2002 rules. D - Data: The prior system did not accept payment amounts with more than 2 decimal places; the new system must. A prior-version oracle would treat 3-decimal-place inputs as an error. Better oracle: Spec oracle for the new data validation rules. I - Interface: The new system integrates with a new payment gateway (Windcave) that the prior system never used. The prior-version oracle has no view of whether the Windcave API payload is correct. Better oracle: Windcave API contract oracle - validate request and response payloads against Windcave's published API specification. P - Platform: The prior system was not tested on mobile. User research shows 60% of ratepayers now pay on mobile. Prior-version oracle cannot validate mobile behaviour. Better oracle: User expectations oracle - run mobile usability review against NZ Web Accessibility Standard 1.1 (WCAG 2.1 AA). O - Operations: The prior system logged nothing on payment failure. The new system is required to log all failures for fraud monitoring. Prior-version oracle (which had no logging) cannot validate the new logging requirement. Better oracle: Spec oracle for the new logging requirements; operations review by the DevOps team. T - Time: The prior system was built before NZ daylight saving affected the payment timestamp. The new system must handle NZST/NZDT transitions correctly for midnight payments. Better oracle: World oracle (NZ timezone transition dates) combined with a boundary test for 11:58 PM NZST on the last Sunday in April.
Self-Check
Click each question to reveal the answer.
Q1: What is the oracle problem, and why does it matter for automated testing?
The oracle problem is the challenge of knowing how to recognise a problem in software output. It matters for automated testing because every assertion encodes an oracle assumption — the expected value must be correct for the assertion to be meaningful. A test suite built against a wrong oracle (e.g. a legacy system with a calculation bug) will pass even when the system is broken. Test coverage numbers are irrelevant if the oracles are bad.
Q2: What does SFDIPOT stand for, and how do you use it in test planning?
Structure, Function, Data, Interface, Platform, Operations, Time. In test planning, run through each dimension and ask: “What oracle do I need to determine correctness for this dimension?” and “Where does my current oracle break down?” This surfaces oracle gaps before testing begins, when they are cheap to address, rather than after release, when they become incidents.
Q3: Name three oracle types from FEW HICCUPPS and give a concrete NZ example of each.
Statutes (S): A KiwiSaver withdrawal calculation must comply with the KiwiSaver Act 2006 — the statute is the oracle. Comparable products (C): A NZ bank’s mortgage calculator output can be checked against a competitor bank’s published calculator for the same inputs. User expectations (U): A RealMe identity verification flow should not ask users to re-enter information they already provided in a previous step — users expect continuity, and a violation is a bug even if the spec doesn’t mention it.
Q4: Why is a prior-version oracle risky, and when should you refuse to use it?
A prior-version oracle asserts that the new system matches the old one — which is only valid if the old system was correct. Refuse it when: the prior system had known bugs in the area being tested, the domain rules have changed since the prior system was built, or the new system has different requirements (e.g. a new integration, new data types, new regulations). Before accepting a prior-version oracle, get explicit sign-off from a domain expert that the legacy system output is trustworthy for the specific calculations in scope.
Q5: How do you document oracle choices, and why does this matter on regulated NZ projects?
Document oracle choices in the test plan or test strategy: for each test area, state which oracle(s) are being used, what they cover, and where they are incomplete. On NZ regulated projects (ACC, IRD, MSD, banking), oracle documentation becomes part of the compliance and audit evidence. It answers the question: “How did you know the system was producing correct outputs?” A test suite without oracle documentation cannot answer that question, which is a compliance risk.
★ Interview Questions
What NZ hiring managers ask about oracle reasoning at senior and lead level — and what strong answers look like.
Q: What is a testing oracle, and can you give an example where a bad oracle caused a real testing failure?
Strong answer: An oracle is any mechanism by which you recognise a problem in software output — a spec, a prior version, domain knowledge, user expectations, or legislation. A bad oracle example: a team migrating a loan calculation engine used the legacy system as the oracle — every automated test asserted the new engine matched the old one. The old engine had a compound-interest calculation bug for variable-rate loans. All tests passed. The bug shipped. The oracle was faithfully checking that the new system was as wrong as the old one.
Senior
Q: Walk me through SFDIPOT and explain how you would use it to plan oracle selection for a new feature.
Strong answer: SFDIPOT gives you seven lenses: Structure (is output correctly structured?), Function (does it do what the spec says?), Data (does it handle boundary values, types, and encoding correctly?), Interface (does it communicate correctly with users and other systems?), Platform (does it behave consistently across browsers, devices, OS?), Operations (can it be deployed, monitored, and operated?), and Time (does it handle dates, timezones, and timing correctly?). I run through each dimension in a planning session with the team and ask: “What oracle are we using to determine correctness here, and where does that oracle break down?” The T dimension almost always surfaces a gap on NZ projects because NZ daylight saving (twice a year) and business-day logic (Waitangi Day, ANZAC Day) are rarely handled in the spec.
Senior
Q: A developer says, “All our automated tests pass — the system is correct.” How do you challenge this using oracle theory?
Strong answer: Passing tests tell you the system matches the oracle embedded in each assertion — nothing more. The question is whether those oracles are correct and complete. I would ask three things: First, what oracle was used to generate each expected value? If it was copied from the legacy system, we need to know whether the legacy system is trustworthy for those calculations. Second, which dimensions of SFDIPOT are not covered by assertions at all? Most automated suites are strong on Function and weak on Time, Platform, and Operations. Third, does the test suite include statute and user-expectation oracles for regulated functionality? A suite that passes the spec but never checks Privacy Act compliance is not telling us the system is correct — it is telling us it matches a necessarily incomplete spec.
Senior
Q: How do you incorporate oracle analysis into a test plan for an NZ government project where compliance is a mandatory deliverable?
Strong answer: The test plan should include an oracle section that lists, for each major test area: the oracle(s) being used, the authority for each oracle (spec version, legislation reference, domain expert name), coverage limitations (where the oracle is incomplete), and escalation process when oracle correctness is disputed. For regulated NZ government work, statute oracles are not optional: the Privacy Act, Consumer Guarantees Act, and domain-specific legislation (ACC Act, KiwiSaver Act, Income Tax Act) must be referenced. I also document oracle challenges before testing begins — questions like “is the legacy system trustworthy for this calculation?” and “who has authority to define correct behaviour for this edge case?” — because those questions are cheaper to resolve in planning than post-release.
Lead
Q: You are leading QA on a system migration where the oracle strategy is “new system matches old system”. What risks do you raise, and how do you mitigate them?
Strong answer: The prior-version oracle is a valid starting point but it has four risks I would raise explicitly. First, it propagates legacy bugs: any known defect in the old system becomes an accepted expected value in the new one. Mitigation: produce a list of known legacy defects and exclude them from prior-version comparison; use domain-expert oracles for those inputs instead. Second, the migration may have different requirements: new integrations, new data types, new regulations mean some outputs should deliberately differ. Mitigation: identify all intentional deltas upfront and use spec or statute oracles for those areas. Third, oracle coverage gaps: the old system may never have been tested for Platform, Operations, or Time dimensions — so no prior-version oracle exists. Mitigation: SFDIPOT analysis to identify gaps and build new oracles from scratch. Fourth, regression of improvements: if the migration was partly to fix bugs, a prior-version oracle will flag the fixes as failures. Mitigation: flag known fixes as intentional deltas at the start of the project.
Lead