Banking QA · Lesson 5

AML/CFT & Fraud Testing

Banks are legally required to know their customers and to spot money laundering and fraud. The systems that do this make consequential decisions about real people, so a wrong rule and a missed one both cause harm. This lesson teaches you to test both edges.

Banking QA Banking & Payments — Lesson 5 of 5 ~30 min read · ~75 min with exercises

1 The Hook

A fictional NZ bank, Tūrama Bank, rolled out a new transaction-monitoring rule to catch a money-laundering pattern called structuring — breaking a large sum into many small deposits to stay under reporting thresholds. The rule flagged any customer who made several cash deposits just below a round figure within a few days. In testing it caught the laundering scenario perfectly. It shipped.

Within a fortnight the financial-crime team was drowning. The rule was firing on hundreds of ordinary customers a day — a builder banking cash from several jobs in a week, a market trader, a family pooling cash for a tangi. Each alert had to be reviewed by a human, and the team could not keep up. Worse, while analysts waded through innocent alerts, the real suspicious cases sat in the same undifferentiated pile, and some genuinely risky activity was reviewed late.

The rule was not broken in the way a normal bug is broken. It did exactly what it was written to do. The problem was that it could not tell the structuring launderer from the cash-heavy builder, and the cost of that confusion landed on both sides — innocent customers questioned and inconvenienced, and real crime reviewed slowly because the queue was full of noise.

Here is the lesson hidden in that story. The team tested that the rule caught the bad pattern. They never tested how often it fired on good patterns, nor what happened to a real alert when the queue was flooded. In AML and fraud, a rule has two costs — missing the guilty and catching the innocent — and a tester must measure both, because optimising only the first creates the second.

2 The Rule

An AML or fraud rule has two failure costs, and you must test for both. A false negative lets crime through; a false positive accuses an innocent customer and floods the queue so real alerts are reviewed late. Testing only that the rule catches the bad case is half a test. The other half is measuring how often it fires on the good case — and proving that a real alert still gets handled when the queue is full.

3 The Analogy

Analogy

A smoke alarm in a kitchen that also makes toast.

A smoke alarm has to catch a real fire — that is non-negotiable. But set it too sensitive and it shrieks every time someone makes toast, and what happens next is the real danger: people start ignoring it, or they pull the battery out. Now the alarm that was meant to save lives is silent when it matters. The toast is the false positive, and the disabled alarm is what the false positive eventually causes.

An AML or fraud rule is that alarm. Tūrama Bank’s rule caught the fire but it screamed at every piece of toast, and the financial-crime team — like the household with the battery out — could not respond properly to the real thing because of the noise. A good tester checks that the alarm catches the fire and that it does not cry wolf so often that the people relying on it stop listening.

4 The NZ AML/CFT Act 2009

The Anti-Money Laundering and Countering Financing of Terrorism Act 2009 is the law that sits behind all of this in NZ. It requires banks and other reporting entities to take active steps to detect and deter money laundering and the financing of terrorism. It is not optional and it is not a guideline — it is a legal obligation with real consequences for failing to meet it.

For a tester, the Act translates into a set of capabilities the system must actually deliver. The bank must verify who its customers are, must monitor their transactions for suspicious activity, must report certain things to the authorities, and must keep records that prove it did all of this. Each of those is a system behaviour you can test.

You do not need to be a lawyer to test AML systems. What you need is to understand the obligations well enough to know what the system is required to do, so that you can check it does it — and, just as importantly, that it does it without harming innocent customers in the process. The Act sets the floor; good testing makes sure the system both clears the floor and treats people fairly while doing so.

Pro tip: Frame every AML test around an obligation the Act creates: “the system must verify identity,” “the system must monitor for suspicious patterns,” “the system must produce a report.” Then test both that the obligation is met and that meeting it does not over-fire on ordinary customers. The two together are what a regulator and a customer both need.

5 KYC and Customer Due Diligence

Know Your Customer (KYC), or customer due diligence, is the front door. Before a bank lets someone open an account or do certain transactions, it must establish who they are. This is where the Act’s identity obligation lives, and it is rich with test cases.

The happy path — a customer with a clear NZ driver licence and proof of address sails through — is the easy part. The interesting testing is at the edges. What happens when the name on the identity document does not exactly match the application? When a customer’s details partially match a sanctions or watch list — is the match strong enough to stop them, or is it a coincidence of a common name? When a customer is assessed as higher risk, does the system apply the enhanced checks the Act expects, rather than waving them through?

Two errors sit at this front door, and they mirror the rule’s two costs. The system can let through someone it should have stopped — a sanctioned individual, a stolen identity. Or it can wrongly block a legitimate customer — a new migrant with valid but unfamiliar documents, someone whose name happens to resemble a listed person. A tester checks both: that the genuinely high-risk are caught, and that ordinary people with valid identity are not locked out of banking by an over-eager match.

6 Transaction Monitoring and Suspicious-Activity Flows

Transaction monitoring is the ongoing watch. After a customer is onboarded, the system observes their transactions and raises an alert when something matches a suspicious pattern — structuring like the Tūrama case, sudden activity inconsistent with the customer’s profile, money moving rapidly in and straight back out, payments to high-risk destinations.

When the system raises an alert, a suspicious-activity workflow begins. A human analyst reviews the alert, gathers context, and decides: is this innocent, or is it genuinely suspicious? If genuinely suspicious, the bank may need to file a report with the authorities — in NZ, a suspicious activity report to the Financial Intelligence Unit. The Act requires this reporting, and it requires that the customer is not tipped off that a report has been made.

For a tester the workflow is as important as the rule that triggers it. You test that an alert is created with the right information, that it routes to the right queue, that an analyst can record a decision, that a confirmed case can be escalated and reported, and that the whole trail is recorded for the auditor the Act anticipates. A rule that fires correctly but feeds a broken workflow still fails the obligation — and a workflow that buckles under a flood of false positives, as Tūrama’s did, fails it just as surely.

7 False Positives and the Two Costs

This is the heart of AML and fraud testing, so it deserves its own section. Every detection rule makes two kinds of error, and they pull in opposite directions.

A false negative is a miss: real money laundering or fraud that the rule did not catch. The cost is obvious — crime gets through, the bank fails its legal obligation, customers may be defrauded. This is the cost everyone instinctively tests for.

A false positive is a false alarm: an innocent customer or transaction the rule flagged as suspicious. The cost is less obvious but just as real. Each false positive is an innocent person questioned, delayed, or frozen out of their own money. And in bulk, false positives flood the review queue so that analysts cannot get to the real cases in time — the Tūrama failure. A false positive, at scale, manufactures false negatives.

The job of a tester is to make both costs visible and measurable. You cannot eliminate either — tightening the rule to catch more crime always catches more innocents, and loosening it to spare innocents always lets more crime through. What you can do is measure where the rule sits on that trade-off, test the workflow that handles the inevitable false positives gracefully, and prove that a genuine alert is not lost in the noise. Fairness matters here too: a rule that over-fires on a particular community — flagging ordinary cash-based or remittance-sending customers far more often — is a fairness failure as well as an operational one.

8 What to Test in AML/Fraud

The practical checklist:

Detection (false negatives): known suspicious patterns — structuring, rapid in-out, profile mismatch — are caught. Build test scenarios for each pattern the rules target.
False-positive rate: run realistic ordinary-customer scenarios (the cash-heavy builder, the market trader) and measure how often the rule fires on them. A rule with a sky-high false-positive rate is a defect, not a success.
Threshold boundaries: test just below, at, and just above each rule threshold, including the gaming case — a launderer deliberately structuring under a limit.
KYC edge cases: name mismatches, partial sanctions-list matches, higher-risk customers triggering enhanced checks, and valid-but-unusual identity not being wrongly blocked.
Workflow integrity: alerts carry the right data, route to the right queue, capture a decision, escalate, and report — with a complete audit trail.
Queue-under-load: when the queue floods, a genuine high-priority alert is still surfaced and handled, not buried — the Tūrama lesson.
No tipping off: the customer is not told a suspicious activity report has been made, as the Act requires.
Fairness: the rule does not over-fire on a particular group of ordinary customers far more than others.

9 Common Mistakes

🚫 Testing only that the rule catches the bad case

Why it happens: Catching the launderer is the obvious goal and the satisfying test to pass.
The fix: The Tūrama trap. A rule has two costs. Also run ordinary-customer scenarios and measure the false-positive rate, and test the workflow under a flood of alerts. Optimising detection alone manufactures the false positives that bury the real cases.

🚫 Not testing the threshold boundaries and the gaming case

Why it happens: A mid-range value passes, so the edges feel covered.
The fix: Launderers deliberately operate just under thresholds — that is the whole point of structuring. Test just below, at, and just above each threshold, and the deliberate-gaming scenario where someone splits activity to stay under a limit.

🚫 Treating KYC blocking as always correct

Why it happens: Blocking a possible match feels like the safe, compliant choice.
The fix: Wrongly blocking a legitimate customer — a migrant with valid unfamiliar documents, a common-name coincidence — locks a real person out of banking and can be a fairness failure. Test that genuine identity is not blocked by an over-eager match, alongside testing that real high-risk cases are caught.

🚫 Testing the rule but not the workflow behind it

Why it happens: The rule firing looks like the whole feature.
The fix: A correct alert that feeds a broken workflow still fails the obligation. Test that alerts carry the right data, route correctly, capture decisions, escalate, report to the Financial Intelligence Unit, keep an audit trail, and do not tip off the customer — and that a real alert survives a flooded queue.

Senior engineer insight

The most dangerous moment in AML testing is when the rule passes QA cleanly on your synthetic scenarios and everyone relaxes — because your scenarios were written by the same people who designed the rule, with the same mental model of what a launderer looks like. What changed how I think about this: seeing a bank's financial-crime team burn through two weeks of analyst capacity reviewing false positives on remittance customers who were doing nothing suspicious, while a real structuring case sat in the same queue and aged out of the reporting window. The rule was "correct" by every test we had written.

Most common mistake: writing detection-only test scenarios and calling the feature tested. If your test suite has no ordinary-customer scenarios and no measurement of false-positive rate, you have not tested the rule — you have tested that it can fire.

From the field

A team building transaction-monitoring for a mid-tier NZ deposit-taker assumed their threshold values were calibrated — the product owner had set them based on the DIA's typology guidance and they had passed UAT against three structuring scenarios. What they discovered six weeks post-launch was that their false-positive rate on Pasifika remittance customers was running at eight times the rate for other segments, because those customers legitimately pool and transfer cash in patterns the rule interpreted as structuring. The RBNZ's AML/CFT supervisory expectations require that your programme is risk-based, not demographic-blind, and the DIA had published guidance on exactly this remittance-community pattern. The lesson that generalises: calibration testing needs to include realistic volume from multiple customer archetypes, and fairness measurement — does the rule's false-positive burden fall unevenly across groups — belongs in your test plan before go-live, not in a post-incident review.

10 Now You Try

Three graded exercises on AML and fraud workflows. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the AML Risks

Read the description of a fictional new fraud-detection rule at an NZ bank below. Identify 3 risks — covering both false negatives and false positives, and the workflow — and explain the harm of each.

Rule: flag unusual outbound payments
A new rule freezes any account that makes an outbound payment larger than the customer’s previous biggest payment, on the theory that fraudsters drain accounts in one big transfer. It was tested against a captured account-takeover scenario and caught it. The rule was not tested against ordinary customers. When it freezes an account it does so silently with no alert routed to a human, and the customer is simply declined with a generic error. There is no path to quickly unfreeze a wrongly frozen account.

List 3 risks (false negative, false positive, workflow) and the harm of each:

Show model answer

There are at least four real risks here; any three well-explained earns full marks.

1. Massive false-positive rate (false positive) — "Larger than the previous biggest payment" fires on countless ordinary events: paying a bond, buying a car, settling on a house, a first big invoice. Harm: huge numbers of legitimate customers frozen out of their own money, and at scale the review load buries real fraud. This is the Tūrama two-costs failure.

2. No human alert / broken workflow (workflow) — Freezing silently with no alert routed to a human means no one reviews the decision, and a generic decline gives the customer no path forward. Harm: a wrong freeze has no correction path; the AML/CFT obligation to investigate and record is not met; no audit trail.

3. Easy to evade / false negative (false negative) — A fraudster who has taken over an account can simply make a payment smaller than the previous biggest one, or build up the "biggest payment" history first, and walk straight past the rule. Harm: the real fraud the rule was meant to stop still gets through, so it adds enormous false-positive cost without reliably closing the gap.

Bonus: no quick unfreeze path compounds the harm — even a correctly suspicious freeze needs a fast, fair resolution route.

The trap: it passed the one captured takeover scenario, so detection "worked", while the false-positive and workflow costs were never measured.

🔧 Exercise 2 of 3 — Fix the Test Case

The AML test case below only checks detection. Rewrite it to test both costs and the workflow, with these fields: Test ID, Obligation (AML/CFT Act 2009), Risk category, What it verifies, Scenarios, Acceptance criteria, Evidence required, Traceability. Use a fictional KiwiFirst Bank structuring-detection rule as the context.

Original (detection only):
“Run the structuring scenario. Check the rule raises an alert. Pass if it fires.”

Rewrite as a two-cost, workflow-aware test case:

Show model answer

Test ID: AML-STRUCT-013

Obligation: AML/CFT Act 2009 — ongoing transaction monitoring for suspicious activity, with reporting and record-keeping.

Risk category: Detection effectiveness AND false-positive load (two costs) + workflow integrity.

What it verifies: The structuring rule catches genuine structuring, does NOT over-fire on ordinary cash-heavy customers, and routes a genuine alert through a working review-and-report workflow even under load.

Scenarios:
  a) True positive — a classic structuring pattern (many deposits just under a threshold over a few days) → alert raised.
  b) False positive — a cash-heavy builder banking several jobs in a week → measure whether it fires; should be low/justified.
  c) Boundary/gaming — deposits exactly at and just under the threshold, and a deliberate split to stay under.
  d) Workflow — a true-positive alert routes to the queue, captures an analyst decision, can be escalated and reported to the Financial Intelligence Unit, with an audit trail and no tipping off.
  e) Under load — a high-priority true alert is still surfaced when the queue is flooded.

Acceptance criteria: True-positive scenarios alert (0 missed); the false-positive scenario fires at or below an agreed rate; boundary cases behave per spec; the workflow completes end to end with a full audit trail and no customer tip-off; the under-load alert is surfaced.

Evidence required: Scenario inputs and outcomes; measured false-positive rate vs target; the workflow trail (alert → decision → escalation → report); the queue-under-load result.

Traceability: AML risk register R-14 (structuring missed) and R-15 (over-firing floods the queue).

What makes it strong: it tests BOTH costs (detection and false positives), the threshold/gaming boundaries, the end-to-end workflow including reporting and no-tip-off, and behaviour under load — not just "the rule fires". The original tested one of these.

🏗️ Exercise 3 of 3 — Build a KYC Test Plan

Design a KYC / customer due diligence test plan of 5 test cases for a fictional NZ bank onboarding new customers under the AML/CFT Act 2009. Each case needs at least: an ID, what it verifies, an acceptance criterion, and the evidence required. Cover clean onboarding, name mismatch, partial sanctions-list match, higher-risk enhanced due diligence, and a valid-but-unusual identity that must not be wrongly blocked.

Show model answer

KYC-01 | Verifies: a clean applicant with valid NZ identity onboards successfully | Acceptance criteria: a customer with a valid driver licence and proof of address is verified and onboarded; the identity check is recorded | Evidence required: the verified identity record; the audit log of the check

KYC-02 | Verifies: a name mismatch between the document and the application is handled correctly | Acceptance criteria: a mismatch is flagged for review, not silently passed or silently failed; resolution is recorded | Evidence required: the mismatch flag; the review decision and who made it

KYC-03 | Verifies: a partial sanctions/watch-list match is escalated, not auto-blocked or auto-passed | Acceptance criteria: a partial match (e.g. common-name coincidence) routes to human review with the match strength shown; a strong/confirmed match is stopped | Evidence required: the match record with score; the routing/decision trail

KYC-04 | Verifies: a higher-risk customer triggers enhanced due diligence | Acceptance criteria: a customer assessed as higher risk has the Act's enhanced checks applied, not standard checks; the basis is recorded | Evidence required: the risk assessment; evidence the enhanced checks ran

KYC-05 | Verifies: a valid-but-unusual identity is not wrongly blocked | Acceptance criteria: a legitimate applicant with valid but unfamiliar documents (e.g. a new migrant) is onboarded or routed to fair review, not auto-rejected; false-block rate is acceptable | Evidence required: the scenario inputs; the outcome showing not auto-rejected; fairness/false-block measurement

Strong plans: each case is specific, has a measurable criterion, names concrete evidence, and together they cover clean onboarding (KYC-01), name mismatch (KYC-02), partial sanctions match (KYC-03), enhanced due diligence (KYC-04), and not wrongly blocking valid identity (KYC-05) — testing both the catch-the-bad and don't-harm-the-good edges. Weak plans say "check KYC works" five times — that is the difference being marked.

Why teams fail here

Writing scenarios only for the crime pattern and never running the rule against realistic ordinary-customer data — so the false-positive cost is completely invisible until it hits production.
Treating the suspicious-activity workflow as out of scope for QA because it "belongs to ops" — then discovering in production that alerts are routing to dead queues, analyst decisions are not being recorded, or FIU reporting requires manual workarounds that create compliance gaps under the AML/CFT Act 2009.
Not testing threshold boundaries and the gaming case — testing a mid-range value and assuming the edges are fine, when structuring is by definition the adversarial exploitation of those exact edges.
Skipping fairness measurement — never checking whether the rule's false-positive burden falls disproportionately on remittance-sending, cash-based, or migrant communities, a gap that DIA and RBNZ supervisors increasingly scrutinise.

11 Self-Check

Click each question to reveal the answer.

Q1: What are the two costs of an AML or fraud rule, and why must you test both?

A false negative — letting real crime through — and a false positive — flagging an innocent customer. You must test both because optimising only detection creates false positives that question innocent people and, at scale, flood the review queue so real alerts are handled late. A false positive at scale manufactures false negatives. That is the Tūrama lesson.

Q2: Why test the threshold boundaries and the gaming case specifically?

Because launderers deliberately operate just under reporting thresholds — structuring is exactly that. A mid-range value passing tells you nothing about the edge. Test just below, at, and just above each threshold, plus the deliberate-split scenario where someone games the limit, because that is where the real adversarial behaviour lives.

Q3: What is the law behind AML testing in NZ, and what does it require?

The Anti-Money Laundering and Countering Financing of Terrorism Act 2009. It requires reporting entities to verify customer identity, monitor transactions for suspicious activity, report certain matters to the authorities, and keep records proving they did so — each of which is a testable system behaviour, alongside doing it without unfairly harming innocent customers.

Q4: Why is testing the suspicious-activity workflow as important as testing the rule?

Because a rule that fires correctly but feeds a broken workflow still fails the obligation. The alert must carry the right data, route to the right queue, capture an analyst decision, escalate, report to the Financial Intelligence Unit, keep an audit trail, and not tip off the customer — and a genuine alert must survive a flooded queue rather than being buried in noise.

Q5: How is fairness a concern in AML and fraud testing?

A rule that over-fires on a particular group — flagging ordinary cash-based or remittance-sending customers far more often than others — is a fairness failure as well as an operational one. It subjects a community to disproportionate suspicion and inconvenience. A tester measures whether the rule’s false-positive burden falls unevenly, not just its overall rate.

Key takeaway

In AML and fraud testing, a rule that only catches criminals is half-tested — the other half is proving it does not drown your analysts in innocent people, because at scale a false positive manufactures the false negative that lets the real crime through.

12 Interview Prep

Real questions asked in NZ QA interviews for financial-crime and fraud roles. Read the model answers, then practise your own version.

“How would you test a new transaction-monitoring rule?”

I’d test both of its costs. First detection: build scenarios for each suspicious pattern it targets — structuring, rapid in-out, profile mismatch — and confirm it catches them with nothing missed. But that is only half. Then I’d run realistic ordinary-customer scenarios — the cash-heavy builder, the market trader — and measure the false-positive rate, because a rule that screams at every piece of toast is a defect. I’d test the threshold boundaries and the deliberate-gaming case, the end-to-end workflow including reporting to the Financial Intelligence Unit and no tipping off, and behaviour under a flooded queue. And I’d check fairness — that it does not over-fire on one community. Optimising detection alone is what buries the real alerts.

“A fraud rule has a 95% detection rate. Why might that still be a bad rule?”

Because detection is only one of its two costs. A rule can catch 95% of fraud and still be terrible if it also flags huge numbers of innocent customers — freezing people out of their own money and flooding the analyst queue so the real cases are reviewed late. I’d want to see the false-positive rate alongside the detection rate, the impact on the review workload, and whether the false positives fall unfairly on a particular group. A high detection number with an unmeasured false-positive cost is exactly how a rule looks good in a test and causes harm in production.

“What does the AML/CFT Act 2009 require, and how does that shape your testing?”

It requires NZ reporting entities to verify who their customers are, monitor transactions for suspicious activity, report certain matters to the authorities, and keep records proving they did so. That shapes my testing into concrete behaviours: I test that KYC verifies identity and applies enhanced checks for higher-risk customers, that monitoring catches the suspicious patterns, that the workflow files a suspicious activity report without tipping off the customer, and that the audit trail is complete for a regulator. And throughout, I test the other edge — that meeting these obligations does not wrongly block or over-flag legitimate customers, because the Act sets the floor and fair treatment is what good testing adds on top.

← PCI-DSS for Testers Back to Banking QA Track →

AML/CFT & Fraud Testing

1 The Hook

2 The Rule

3 The Analogy

4 The NZ AML/CFT Act 2009

5 KYC and Customer Due Diligence

6 Transaction Monitoring and Suspicious-Activity Flows

7 False Positives and the Two Costs

8 What to Test in AML/Fraud

9 Common Mistakes

10 Now You Try

11 Self-Check

Related techniques

12 Interview Prep