Insurance QA · Lesson 3

Underwriting & Rating-Engine Testing

The rating engine turns a risk into a price. It is a rules engine fed by rate tables, and a single wrong factor or stale table mis-prices thousands of policies before anyone notices. This lesson teaches you to prove the price is the price the rules intend.

Insurance QA Insurance QA — Lesson 3 of 5 ~30 min read · ~75 min with exercises

1 The Hook

Kowhai Insurance, a fictional NZ vehicle insurer, updated its rating engine to load a new rate table for the coming year. The actuaries had set higher base rates to reflect rising repair costs. The change was deployed, quotes kept flowing, and the dashboards looked normal.

What had gone wrong was quiet. The new rate table loaded correctly for most regions, but the young-driver loading factor — the multiplier that raises the premium for drivers under 25 — had been keyed as 1.05 instead of 1.50. For two months, every quote to a young driver came out roughly a third cheaper than the actuaries intended. The price looked plausible. No quote errored. Nothing crashed. The engine did exactly what the table told it to do, and the table was wrong.

The shortfall surfaced at the next portfolio review, when the actuaries noticed young-driver premiums running well below model. By then thousands of policies had been written at the wrong price, the insurer carrying more risk than it had charged for. Unwinding it meant re-rating, re-issuing, and difficult conversations with customers whose renewal premium would suddenly jump.

Here is the lesson hidden in that story. The team had tested that the rating engine produced a quote, that quotes did not error, that the page loaded. What no test did was assert that a specific input produced a specific, known-correct premium — that a 22-year-old in Hamilton with a given car and history got exactly the figure the rate table and rules intend. A rating engine that returns a plausible wrong number passes every test that only checks it returns a number.

2 The Rule

A rating engine does exactly what its rate tables and rules tell it to do, so testing that it returns a price proves nothing. Assert that a known set of inputs produces a specific, independently-calculated premium — to the cent. A plausible wrong price is the most dangerous defect in insurance, because nothing flags it until the portfolio is already mis-priced.

3 The Analogy

Analogy

A self-checkout at the supermarket with one wrong shelf price.

Imagine the system that prices your groceries has the wrong rate loaded for one product — cheese rung up at a fifth of its real price. The checkout still works. It scans, it totals, it takes payment, the receipt prints. Every transaction succeeds. The only thing wrong is that one number, and because the total still looks like a normal grocery bill, nobody notices until stocktake shows the cheese running at a loss.

A rating engine is the same. Kowhai Insurance loaded the wrong rate for one factor and every quote still “worked.” A rating tester is not the person who checks the till switches on — they are the person who scans a known basket and asserts the total is the exact amount it should be, factor by factor, so a single wrong shelf price is caught before stocktake.

4 The Rating Chain

A premium is built up through a chain of steps. A tester must be able to follow it, because a defect anywhere in the chain shows up only in the final number.

Inputs (the risk). The rated characteristics — driver age, vehicle, address, claims history, sum insured, excess chosen. These are the facts the price is built from, and a wrong or mis-mapped input mis-prices the risk before any factor is applied.

Base rate. A starting premium for the product and risk class, drawn from a rate table. This is the foundation every factor multiplies or adds to.

Rating factors. Multipliers and adjustments for each characteristic — a young-driver loading, a region factor, a no-claims discount, an excess discount. Each is looked up from a table and applied in a defined order. The Kowhai failure was a single wrong factor.

Loadings and discounts. Additions for elevated risk and reductions for mitigations, plus any minimum-premium floor. Order matters: a discount applied before or after a loading can change the answer.

Taxes, levies, and fees. Statutory and product charges added on top — for NZ this can include levies collected with the premium. These must be applied to the correct base and at the correct rate.

Final premium. The rounded, presentable price. Rounding rules and the minimum-premium floor are the last place a cent or a dollar can go wrong.

Pro tip: The most powerful rating test is an independent recalculation: take a fixed risk, work the premium out by hand or in a separate spreadsheet from the rate tables and rules, and assert the engine returns exactly that. If you can only check “a number came back,” you cannot catch a wrong factor — you need a known-correct target to compare against.

5 Underwriting Rules versus Rating

Two related but distinct things happen at quote, and a tester should keep them apart.

Underwriting rules decide whether to offer cover, and on what terms. They are yes/no and routing decisions: decline a risk outside appetite, refer a high-value or unusual risk to a human underwriter, require an inspection, apply a mandatory exclusion. A defect here can offer cover the insurer never wanted to carry, or wrongly decline a customer the insurer would happily insure.

Rating decides the price for a risk that is accepted. It is the arithmetic chain above. A defect here gets the price wrong on a risk that is correctly accepted.

The two interact: a rule might cap a factor, force a minimum premium, or route certain combinations to manual underwriting instead of an automated price. So a rating tester tests the decision boundaries as carefully as the arithmetic — the exact age, sum insured, or risk score at which a quote flips from auto-priced to referred, from offered to declined. Boundary conditions on the rules are where automated quoting most often does the wrong thing silently.

Pro tip: For every underwriting rule, find its boundary and test on both sides of it. If risks over $2m sum insured must be referred, test $1,999,999 (auto-priced), $2,000,000, and $2,000,001 (referred). Off-by-one and inclusive/exclusive boundary defects are the classic rules-engine bug.

6 Rate Tables and Versioning

Rate tables are the data the engine runs on, and they change — annually, at portfolio reviews, in response to claims experience. Every change is a release, and a tester treats it like one.

  • Value correctness: the new factors are exactly what the actuaries signed off — the 1.50 that became 1.05 is a value-correctness defect, caught by comparing the loaded table against the approved source.
  • Effective dating of the table: a new table applies from its effective date. A quote dated before the change must use the old table; a quote on or after must use the new one. Test quotes straddling the changeover.
  • Versioning and reproducibility: a quote records which rate-table version produced it, so a price can be reproduced and explained later — essential when a customer or regulator asks how a premium was set.
  • Full re-rate regression: after a table change, re-rate a fixed reference set of risks and compare every premium to the expected new figure, so no factor moved that should not have.
  • Coverage of the whole table: a change can fix one region and break another. Test across the full span of each factor, not just one representative risk.
Pro tip: Keep a fixed reference set of risks — a young driver, an older driver, a high-value home, a rural address, a maximum sum insured — with their expected premiums recalculated for each rate-table version. Re-rating that set on every table change turns “the new rates look fine” into a hard pass/fail that would have caught Kowhai’s 1.05.

7 Edge Cases and Boundaries

Rating engines fail at the edges of their inputs more than in the middle. The cases worth hunting:

  • Factor boundaries: the exact age where the young-driver loading starts or stops, the sum-insured band edges, the postcode at a region boundary — test the value on, just below, and just above each.
  • Minimum-premium floor: a heavily-discounted risk must not fall below the floor. Test a risk whose calculated premium is just under the minimum and confirm it is raised to the floor, not sold below cost.
  • Missing or extreme inputs: a blank field, a zero sum insured, an age of 16 or 99, a maximum-value vehicle — the engine must handle these per the rules, not produce a nonsense price or crash.
  • Discount stacking: multiple discounts together must combine per the defined order and any cap, not multiply into an impossible price or a negative premium.
  • Rounding and currency: premiums must round per the stated rule and hold to the cent; levies and taxes must apply to the right base, in the right order.

8 Building Rating Test Cases

A strong rating test case fixes the full set of inputs, states the rate-table version, and asserts the exact premium against an independent calculation — not just that a quote returned.

Here is a worked test case written to catch the exact Kowhai factor bug:

Test ID: RAT-FAC-005
Scenario: Young-driver loading on a vehicle quote
Risk category: Wrong rating factor loaded (systematic mis-pricing)
Rate table version: MV-RATES-2026-v1 (effective 1 Jan 2026)
Inputs: Driver age 22; Hamilton address; rated vehicle class C;
                  no claims history; $1,000 excess; sum insured $25,000.
Independent calc: Base $900 × region 1.10 × young-driver 1.50
                  × excess-discount 0.95 = $1,410.75 + levy = expected total.
Expected result: Quoted annual premium equals the independent calc EXACTLY,
                  to the cent; young-driver factor applied = 1.50, not 1.05.
Boundary cases: Age 24 (loading applies), age 25 (loading stops),
                  heavily-discounted risk floored at the minimum premium.
Evidence required: Quote with rate-table version stamp; factor-by-factor breakdown;
                  the independent calculation it was checked against.
Traceability: Risk R-01 (rating factor mis-keyed in rate table).
Result: [Pass / Fail]

Notice what makes this catch the Hook bug: the case fixes every input and compares the quoted premium to an independent, factor-by-factor calculation that names the 1.50 explicitly, so a 1.05 cannot pass. It stamps the rate-table version for reproducibility and tests the age boundary and the minimum-premium floor. That is the difference between a real rating test and “a quote came back.”

9 Common Mistakes

🚫 Asserting that a quote returned, not that the quote is correct

Why it happens: A premium appears, looks plausible, and the demo moves on.
The fix: The Kowhai trap is a plausible wrong price — nothing errors. Assert the exact premium against an independent calculation, factor by factor. A rating engine does exactly what its tables say, so “it returned a number” is not evidence the number is right.

🚫 Not re-rating a reference set after a rate-table change

Why it happens: The table is “just data,” so a change feels lower-risk than a code change.
The fix: A wrong value in a table mis-prices a whole book silently. Keep a fixed reference set of risks with expected premiums per table version and re-rate it on every change, comparing every figure — that hard pass/fail catches a mis-keyed factor.

🚫 Testing one representative risk instead of the boundaries

Why it happens: A single middle-of-the-road quote feels representative enough.
The fix: Rating engines fail at the edges — the exact age a loading starts, the sum-insured band edge, the minimum-premium floor, a region boundary. Test on, just below, and just above each boundary, and confirm the floor catches a heavily-discounted risk.

🚫 Confusing an underwriting decision with a rating calculation

Why it happens: Both happen at quote, so they blur together.
The fix: Underwriting rules decide whether to offer, refer, or decline; rating decides the price for an accepted risk. Test the decision boundaries — the exact point a quote flips from auto-priced to referred or declined — separately from the arithmetic, because each fails differently.

10 Now You Try

Three graded exercises across underwriting and rating. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Rating Risks

Read the description of a fictional home-insurance rating engine release below. Identify 3 rating risks that could mis-price a book, apply a stale table, or get an underwriting decision wrong, and name the part of the rating chain or rules each touches (input, base rate, factor, rate-table versioning, underwriting rule).

Release: home-insurance rating engine
A new rate table is deployed for the year. Quotes dated before the effective date still pick up the new table instead of the old one. The flood-zone loading factor was updated, but only for two of the three flood-risk bands; the third band kept last year’s value. Risks above a $3m sum insured are supposed to be referred to a human underwriter, but the engine auto-prices anything up to and including $3m and only refers above it. Quotes do not record which rate-table version produced them. Every quote returns a plausible premium and none errors.

List 3 rating risks and the part each touches:

Show model answer
There are at least four real risks here; any three well-explained earns full marks.

1. Effective-dating of the rate table is wrong — quotes dated before the effective date pick up the new table. Part: rate-table versioning / effective dating. Impact: pre-change quotes priced on the wrong table. Fix: select the table by the quote's effective date and test quotes straddling the changeover.

2. Partial table update — only two of three flood bands updated; the third kept last year's value. Part: factor / value correctness. Impact: one risk band silently mis-priced. Fix: re-rate across the full span of each factor and compare the loaded table to the approved source for every band.

3. Underwriting referral boundary off by one — $3m exactly is auto-priced when it should be referred (inclusive/exclusive boundary). Part: underwriting rule / boundary. Impact: a high-value risk auto-priced that should have a human underwriter. Fix: test $2,999,999, $3,000,000, and $3,000,001 against the rule.

Bonus: no rate-table version stamp on quotes — a price cannot be reproduced or explained later. Part: versioning/reproducibility. Fix: stamp the version on every quote.

The trap: every quote returns a plausible number and none errors — the defects are in which table, which band, and which side of a boundary, none of which a "did a quote come back?" test can see.
🔧 Exercise 2 of 3 — Fix the Test Case

The rating test case below only checks a quote returned. Rewrite it to assert the exact premium against an independent calculation, with these fields: Test ID, Scenario, Risk category, Rate table version, Inputs, Independent calc, Expected result, Boundary cases, Evidence required, Traceability. Use a fictional NZ contents-insurance quote as the context.

Original (too shallow):
“Enter a contents risk and get a quote. Check a premium is shown. Pass if the premium looks reasonable.”

Rewrite as a rating-arithmetic test case:

Show model answer
Test ID: RAT-FAC-011

Scenario: Contents-insurance premium calculation

Risk category: Wrong factor / wrong base rate (mis-pricing)

Rate table version: CT-RATES-2026-v1 (effective 1 Jan 2026)

Inputs: Contents sum insured $60,000; Auckland address; standard security; one prior claim; $400 excess.

Independent calc: Base $300 × region 1.20 × claims-loading 1.15 × excess-discount 0.95 = $393.30, + levy/fees per the rule = expected total, recalculated by hand from the rate table.

Expected result: Quoted annual premium equals the independent calc EXACTLY to the cent; each factor in the engine's breakdown matches the hand calc (region 1.20, claims 1.15, excess 0.95).

Boundary cases: A no-claims version (loading drops out); a heavily-discounted risk floored at the minimum premium; a sum insured at a band edge.

Evidence required: Quote with the rate-table version stamp; factor-by-factor breakdown from the engine; the independent calculation it was checked against.

Traceability: Risk register R-01 (factor mis-keyed) and R-02 (base rate wrong).

What makes it strong: it fixes every input, names a rate-table version, and compares the quote to an independent factor-by-factor calculation to the cent, plus boundary and floor cases. The original would pass on any plausible number.
🏗️ Exercise 3 of 3 — Build a Rate-Table-Change Test Plan

Design a test plan of 5 test cases for a fictional NZ vehicle insurer deploying a new annual rate table. Each case needs at least: an ID, what it verifies, an acceptance criterion, and the evidence required. Cover value correctness against the approved source, effective dating across the changeover, full re-rate of a reference set, an underwriting referral boundary, and rate-table version stamping on quotes.

Show model answer
RT-01 | Verifies: every loaded factor matches the actuary-approved source | Acceptance criteria: 100% of factors in the deployed table equal the approved source across all bands; 0 mismatches | Evidence required: loaded-table vs approved-source comparison across every band

RT-02 | Verifies: the table is selected by the quote's effective date | Acceptance criteria: a quote dated before the changeover uses the old table; one on or after uses the new table; tested straddling the date | Evidence required: two quotes either side of the changeover with the version each used

RT-03 | Verifies: a fixed reference set re-rates to the expected new premiums | Acceptance criteria: each reference risk (young driver, older driver, high-value, rural, max sum insured) returns its recalculated expected premium to the cent | Evidence required: reference-set expected vs actual table with the independent calcs

RT-04 | Verifies: the underwriting referral boundary is correct after the change | Acceptance criteria: risks at, just under, and just over the referral threshold route correctly (auto-priced vs referred) per the rule | Evidence required: three boundary quotes with their routing outcomes

RT-05 | Verifies: each quote records the rate-table version that produced it | Acceptance criteria: every quote stamps a resolvable rate-table version that reproduces the same premium | Evidence required: quote records with version stamps; a reproduced quote matching the original

Strong plans: each case is specific, has a measurable criterion, names concrete evidence, and together they cover value correctness (RT-01), effective dating (RT-02), full re-rate (RT-03), the underwriting boundary (RT-04), and version stamping (RT-05). Weak plans say "check the new rates work" five times — that is the difference being marked.

11 Self-Check

Click each question to reveal the answer.

Q1: Why is “the rating engine returned a quote” not enough to pass a rating test?

Because the engine does exactly what its tables and rules say, so a wrong factor produces a plausible wrong price with no error — the Kowhai trap. You must assert the exact premium against an independent, factor-by-factor calculation. A plausible wrong number passes every test that only checks a number came back, and nothing flags it until the portfolio is already mis-priced.

Q2: What is the most powerful single rating test?

An independent recalculation: fix a known risk, work the premium out by hand or in a separate spreadsheet from the rate tables and rules, and assert the engine returns exactly that, factor by factor, to the cent. It gives you a known-correct target to compare against, which is the only way to catch a mis-keyed factor.

Q3: What must you do every time a rate table changes?

Treat it as a release: check every loaded factor against the approved source across all bands, test effective dating across the changeover, and re-rate a fixed reference set of risks comparing every premium to its expected new figure. A table change can fix one region and break another, so test the full span, not one representative risk.

Q4: How do underwriting rules differ from rating, and why test the boundaries?

Underwriting rules decide whether to offer, refer, or decline cover; rating decides the price for an accepted risk. They fail differently. Boundary defects — the exact age, sum insured, or score where a quote flips from auto-priced to referred or declined — are the classic rules-engine bug, so test on, just below, and just above each boundary.

Q5: Why stamp a rate-table version on every quote?

So a premium can be reproduced and explained later. When a customer or regulator asks how a price was set, you need to know exactly which table and rules produced it. Version stamping turns a quote into a reproducible artefact instead of a number nobody can recreate — essential for both debugging and conduct accountability.

12 Interview Prep

Real questions asked in NZ QA interviews for insurance roles. Read the model answers, then practise your own version.

“How would you test a rating engine?”

I’d never accept that a quote came back as a pass. I fix a known risk — every input pinned — and recalculate the premium independently from the rate tables and rules, factor by factor, then assert the engine returns exactly that to the cent. I stamp the rate-table version so the quote is reproducible. Then I hunt the boundaries: the exact age a loading starts and stops, the sum-insured band edges, the minimum-premium floor catching a heavily-discounted risk, and the underwriting referral threshold on both sides. A rating engine does exactly what its tables say, so the only way to catch a plausible wrong price is to compare it to a known-correct one.

“The actuaries have loaded new rates. What is your test approach for the release?”

I treat the table change like any other release. First, value correctness: compare every loaded factor against the actuary-approved source across all bands, because a single mis-keyed factor mis-prices a whole segment silently. Second, effective dating: quotes before the changeover must use the old table and quotes on or after the new one, so I test quotes straddling the date. Third, a full re-rate of a fixed reference set — a young driver, an older driver, a high-value home, a rural risk, a maximum sum insured — with their expected new premiums, comparing every figure. That reference-set re-rate is exactly what catches a factor keyed as 1.05 instead of 1.50.

“A whole segment of policies turned out to be under-priced. How would you investigate?”

My first hypothesis is a wrong factor or a partial table update affecting that segment — one band or one region keyed wrong, or a stale value left from the previous table. I’d take a representative risk from the segment and recalculate its premium independently from the approved rates, then diff factor by factor against what the engine applied to find which one is off. I’d check effective dating in case the wrong table version was used, and confirm whether the segment crosses a boundary that was mis-handled. Then I’d add the missing reference-set re-rate so the next table change cannot repeat it.