Accessibility Labs · Lesson 1

WCAG 2.2 AA in Practice

“Make it accessible” is not a testable instruction. “Does this date picker pass 2.5.8 Target Size and 2.4.11 Focus Not Obscured?” is. This lesson turns the standard into specific tests you can run against a real component.

Accessibility Labs WCAG 2.2 AA — Lesson 1 of 3 ~30 min read · ~75 min with exercises

1 The Hook

A Waka Kotahi team shipped a redesigned licence-renewal form. It passed the automated scan with flying colours — zero errors in the linter the build pipeline ran on every commit. The product owner read the green tick as “accessible” and signed it off. Six weeks later, a blind customer who renews her licence every year could not complete the form. The new date-of-birth picker was a custom calendar widget she could not operate at all.

How did a clean automated scan miss a form a real person could not use? Because automated tools only catch the criteria a machine can check — missing alt attributes, colour-contrast ratios, empty labels. They cannot tell you whether the focus order makes sense, whether the calendar can be operated from the keyboard, or whether the error message is announced to a screen reader. Studies of automated accessibility tools put their coverage at roughly a third of the WCAG success criteria. The other two-thirds need a human running a test.

The team had treated WCAG as one undifferentiated wall of “accessibility”. WCAG is not a wall. It is a list of numbered, individually testable success criteria — 1.1.1, 2.4.7, 2.5.8, and so on. Each one is a specific, checkable claim about a component. The skill this lesson builds is reading a component and asking, criterion by criterion, “does it pass this one?”

That is the difference between “the scan was green” and “I tested the date picker against 2.1.1 Keyboard, 2.4.7 Focus Visible, 2.5.8 Target Size, and 3.3.1 Error Identification, and it fails the first two.” One of those is auditable. The other is an opinion.

2 The Rule

WCAG is not a vibe, it is a list of numbered success criteria. You do not test “accessibility” — you test a component against named criteria, one at a time, and record a pass or fail with evidence against each number. Automated tools cover only a third of those criteria; the rest are yours to test by hand.

3 The Analogy

Analogy

A building code inspection.

When a building inspector signs off a new house, they do not stand in the doorway and declare it “safe”. They work through a code: the stair rise is within tolerance, the handrail height meets the clause, the smoke alarm is in the right room, the egress window is wide enough. Each clause is a separate, measurable check with a pass or fail. “Looks safe to me” would never make it past a building consent.

WCAG 2.2 is the building code for a web page, and the success criteria are its clauses. Testing accessibility is doing the inspection — clause by numbered clause — not glancing at the page and forming an impression. And like a building inspector who knows that a moisture meter only tells part of the story, you know your automated scanner only checks some clauses. The rest you walk through yourself.

4 POUR as Test Lenses

WCAG groups every success criterion under four principles, abbreviated POUR. Treat them not as theory but as four lenses you point at a component — each one asks a different question and surfaces a different class of failure.

Perceivable — can the user sense the content at all?

Everything must be available to at least one sense the user has. Images need text alternatives (1.1.1) so a screen reader can speak them. Colour cannot be the only way information is conveyed (1.4.1) — a RealMe form that marks errors in red alone fails for a colour-blind user. Text needs sufficient contrast against its background (1.4.3, the AA bar is 4.5:1 for normal text). The lens question: if I could not see, or could not see colour, would I still get this?

Operable — can the user actually drive it?

Everything you can do with a mouse must be doable another way. All functionality must work from the keyboard (2.1.1) with no traps (2.1.2). Focus must be visible (2.4.7). Pages need a way to skip repeated blocks (2.4.1). The lens question: with the mouse unplugged, can I complete every task? This lens is the whole of Lesson 2.

Understandable — can the user make sense of it?

Content and operation must be predictable. Inputs need visible labels (3.3.2). Errors must be identified in text and described (3.3.1, 3.3.3). Components that look the same must behave the same (3.2.4). The lens question: can a first-time user predict what will happen, and recover when something goes wrong?

Robust — will it work with the user’s assistive technology?

The markup must be parseable and expose correct name, role, and value to assistive technology (4.1.2). A custom toggle built from a styled <div> with no role is invisible to a screen reader even if it looks perfect. The lens question: does the code tell assistive technology what each thing is and what state it is in? This lens carries into Lesson 3.

Pro tip: When you pick up a new component, run all four lenses over it in order — Perceivable, Operable, Understandable, Robust. It stops you fixating on the one issue you spotted first and missing the other three classes. A single icon button can fail under all four: no text alternative (P), not keyboard-reachable (O), no visible label (U), and no button role (R).

5 What Is New in WCAG 2.2

WCAG 2.2 added nine success criteria over 2.1. Six of them are at Level A or AA, which means the NZ Government standard requires them. These are the ones a tester who learned 2.1 will not be looking for, so they are where new failures hide.

2.4.11 Focus Not Obscured (Minimum) — AA — when an element has keyboard focus, it must not be entirely hidden behind a sticky header, cookie banner, or toolbar. Test: tab through the page and watch whether the focused element ever disappears under fixed furniture.
2.5.7 Dragging Movements — AA — anything you do by dragging (a slider, a kanban card, a map) must also be doable with single taps or clicks. Test: can you reach the same outcome without a drag?
2.5.8 Target Size (Minimum) — AA — interactive targets must be at least 24 by 24 CSS pixels, or have enough spacing around them. Test: measure the small icon buttons and the close × on a modal.
3.2.6 Consistent Help — A — if a help link, contact number, or chat is offered, it appears in the same relative place across pages. Test: does “Contact us” move around the AoG site?
3.3.7 Redundant Entry — A — information the user already entered in a process must not be demanded again; auto-fill it or let them pick it. Test: does a multi-step RealMe form ask for the same address twice?
3.3.8 Accessible Authentication (Minimum) — AA — logging in must not depend on a cognitive function test like solving a puzzle or remembering a code, unless an alternative exists. Test: can a password manager paste the credential; is there a route that is not a memory test?

Note that 2.4.13 Focus Appearance and 2.5.7 require care to test by hand, while 2.5.8 Target Size is partly measurable with a ruler in dev tools. The point: 2.2 moved the goalposts, and a checklist written for 2.1 will pass components that now fail.

6 Testing a Component Against a Criterion

Testing accessibility is repeatable: take one component, take one numbered criterion, and decide pass or fail with a reason. Here is that worked through for the Waka Kotahi date picker from the Hook, against four criteria.

Component: Date-of-birth calendar widget (licence renewal form)

Criterion: 2.1.1 Keyboard (A)
Method: Unplug mouse; try to open the calendar and select a date using
             Tab, arrow keys, Enter, and Space only.
Result: FAIL — calendar opens on click only; no keyboard handler. A keyboard
             user cannot select a date at all.

Criterion: 2.4.7 Focus Visible (AA)
Method: Tab onto the date field; confirm a visible focus indicator appears.
Result: FAIL — custom styling removed the outline; no replacement indicator.

Criterion: 2.5.8 Target Size Minimum (AA)
Method: Measure each day cell in dev tools.
Result: FAIL — day cells are 18×18 px, below the 24×24 px minimum, with no
             extra spacing.

Criterion: 3.3.2 Labels or Instructions (A)
Method: Check the field has a programmatic label and a stated date format.
Result: PASS — field is labelled “Date of birth” and shows the DD/MM/YYYY format.

Evidence: Screen recording of keyboard attempt; dev-tools measurement screenshot;
             browser/AT versions used.

Notice the shape: each row names one numbered criterion, states a concrete method, and records a pass or fail with a reason — not “the date picker has issues”. That is what a developer can act on and an auditor can verify. A defect raised as “fails 2.1.1 Keyboard: calendar cannot be operated without a mouse” is unarguable in a way that “accessibility problem with the date picker” never is.

7 Common Failures and the Criterion They Break

The same handful of failures appear again and again. Learn to name the criterion each one breaks, and your defects become precise.

  • Icon button with no accessible name (a bare × or hamburger) — breaks 4.1.2 Name, Role, Value and 1.1.1 Non-text Content. A screen reader announces “button” with nothing else.
  • Placeholder text used instead of a label — breaks 3.3.2 Labels or Instructions. The label vanishes the moment the user types.
  • Error shown only by turning the field red — breaks 1.4.1 Use of Colour and usually 3.3.1 Error Identification. Nothing in text says what is wrong.
  • Low-contrast grey hint text — breaks 1.4.3 Contrast (Minimum) when it drops below 4.5:1.
  • Removing the focus outline for looks — breaks 2.4.7 Focus Visible. Keyboard users lose track of where they are.
  • Heading levels skipped or used for size (an <h4> chosen because it looked right) — breaks 1.3.1 Info and Relationships; screen-reader users navigate by heading structure.
  • A modal that does not trap or return focus — breaks 2.4.3 Focus Order and, if you cannot escape it, 2.1.2 No Keyboard Trap.
Pro tip: Build yourself a one-line mapping from symptom to criterion — “no label → 3.3.2”, “red-only error → 1.4.1”. When you can name the number on sight, you stop writing vague defects and start writing ones a developer can close against a clause.

8 The NZ Government Web Accessibility Standard

WCAG is an international guideline; in NZ it is given teeth by the NZ Government Web Accessibility Standard. The current version, 1.2, requires public service and non-public service departments to make their public-facing websites meet WCAG 2.2 Level AA. It sits alongside the Web Usability Standard under the wider All-of-Government umbrella, and the Department of Internal Affairs maintains it.

What this means for you as a tester on an NZ government or government-adjacent project:

  • The bar is AA, not A. Meeting Level A is not enough — the standard names AA, so criteria like 1.4.3 Contrast, 2.4.7 Focus Visible, and the new 2.5.8 Target Size are all in scope.
  • It is 2.2, not 2.1 or 2.0. The standard tracks the current WCAG version, so the six new A/AA criteria from section 5 are mandatory, not optional polish.
  • It applies to the services people use. RealMe login, Waka Kotahi online services, and Te Whatu Ora public information all sit inside scope. If a New Zealander has to use it to deal with government, it has to meet the bar.
  • Conformance is claimed against numbered criteria. A conformance statement lists which criteria were met and how — which is exactly the per-criterion testing this lesson teaches. Vague “we did accessibility” statements do not satisfy it.

So per-criterion testing is not academic. It is the format the NZ standard expects your evidence in. When you test a component against 2.1.1, 2.4.7, and 2.5.8 by number, you are producing exactly the artefact a conformance review asks for.

9 Common Mistakes

🚫 Treating a clean automated scan as “accessible”

Why it happens: The scanner runs in the pipeline, shows zero errors, and a green tick feels like done.
The fix: Automated tools cover only about a third of the success criteria — the machine-checkable ones. Keyboard operability, focus order, sensible alt text, and announced errors all need a human. A clean scan is the start of testing, not the end of it.

🚫 Raising defects as “accessibility issues” with no criterion

Why it happens: The tester saw something wrong but did not map it to a number.
The fix: Name the criterion. “Fails 2.4.7 Focus Visible: the focus outline is removed on the submit button” is actionable and auditable. “Button has an accessibility problem” gets argued about and deprioritised.

🚫 Testing against WCAG 2.1 (or 2.0) and assuming you are covered

Why it happens: Old checklists and old training stop at 2.1, and 2.2 is recent.
The fix: The NZ standard requires 2.2 AA. Add the six new A/AA criteria to your checklist — especially 2.4.11 Focus Not Obscured, 2.5.8 Target Size, and 3.3.8 Accessible Authentication — or you will pass components that the current standard fails.

🚫 Confusing “it has alt text” with “it passes 1.1.1”

Why it happens: The presence of an alt attribute looks like a pass to a quick check.
The fix: 1.1.1 asks for a text alternative that serves the equivalent purpose. alt="image" or a filename passes a scanner and fails the criterion. A decorative image needs empty alt=""; a meaningful one needs a description. Read the alt text, do not just check it exists.

10 Now You Try

Three graded exercises: spot the failures, fix a defect, build a test plan. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Audit the Component

Below is a description of a RealMe-style login button and password field. Audit it against WCAG 2.2 AA. Identify 3 failures, and for each one name the specific numbered success criterion it breaks.

Component: login panel
The password field uses light-grey placeholder text reading “Password” and has no separate visible label. When the password is wrong, the field border turns red and nothing else changes. The “Sign in” control is a styled <div> with an onclick handler. The site asks the user to read and type back a 6-character code shown as a distorted image before they can log in; there is no other way to sign in.

List 3 failures, each with its numbered criterion:

Show model answer
There are at least four real failures; any three correctly named earn full marks.

1. Placeholder used as the only label — Criterion: 3.3.2 Labels or Instructions (A). The label disappears once the user types, leaving no persistent on-screen label. (Low-contrast grey placeholder may also touch 1.4.3 Contrast.)

2. Error signalled by red border alone — Criterion: 1.4.1 Use of Colour (A), and 3.3.1 Error Identification (A). Colour-blind users get no signal, and nothing in text says what went wrong or how to fix it.

3. "Sign in" is a div with onclick — Criterion: 4.1.2 Name, Role, Value (A), and 2.1.1 Keyboard (A). A div has no button role and is not keyboard-focusable or operable with Enter/Space, so screen-reader and keyboard users cannot sign in.

4. Distorted-image code as the only login route — Criterion: 3.3.8 Accessible Authentication (Minimum) (AA). Logging in depends on a cognitive function test (recognising and re-typing distorted characters) with no alternative; it also fails 1.1.1 as a non-text image of text.

Strong answers name the number AND the name, and explain the user impact. Naming "an accessibility issue" without the criterion does not earn the mark.
🔧 Exercise 2 of 3 — Fix the Defect Report

The accessibility defect below is too vague to action. Rewrite it as a precise, criterion-based defect with these fields: Component, WCAG criterion (number + name + level), What was observed, Expected, Steps to reproduce, Evidence. Use a fictional Te Whatu Ora appointment-booking page as the context.

Original (too vague):
“The booking page has accessibility problems. The buttons are hard to see when selected and some of them are too small. Please make it accessible.”

Rewrite as a complete criterion-based defect:

Show model answer
The vague report mixes two separate failures, so the right move is to split it into two precise defects. One worked example:

Component: "Confirm booking" button, Te Whatu Ora appointment-booking page (step 3 of 4).

WCAG criterion: 2.4.7 Focus Visible (Level AA).

What was observed: When the button receives keyboard focus, no visible focus indicator appears — the button looks identical focused and unfocused. The CSS sets outline:none with no replacement.

Expected: A clearly visible focus indicator (outline, border, or background change) on the button when it has keyboard focus, distinct from its resting state.

Steps to reproduce: 1. Open the booking page. 2. Press Tab repeatedly until focus reaches "Confirm booking". 3. Observe that no visual change marks where focus is.

Evidence: Screen recording of tabbing to the button; computed-style screenshot showing outline:none; browser + version.

(The second defect would cover the small targets: Component = date-slot buttons; Criterion = 2.5.8 Target Size Minimum (AA); Observed = slots measure 20×20 px, below the 24×24 px minimum, with no extra spacing; Evidence = dev-tools measurement.)

What makes it right: one criterion per defect, the number + name + level, an observed-vs-expected pair, reproducible steps, and concrete evidence. The original had none of these and bundled two issues into one.
🏗️ Exercise 3 of 3 — Build a Criterion Test Plan

Build a WCAG 2.2 AA test plan of 5 checks for a fictional AoG online services search-and-results page (a search box, filter checkboxes, and a list of result cards). Each check should have: an ID, the numbered criterion, the method, and the pass condition. Cover at least one criterion from each POUR principle and include at least one of the new 2.2 criteria.

Show model answer
A11Y-01 | Criterion: 1.4.3 Contrast Minimum (AA) — Perceivable | Method: measure contrast of result-card body text and filter labels against their background with a contrast tool | Pass condition: all normal text ≥ 4.5:1, large text ≥ 3:1

A11Y-02 | Criterion: 2.1.1 Keyboard (A) — Operable | Method: with the mouse unplugged, run a search, toggle filter checkboxes, and open a result card using Tab/Space/Enter only | Pass condition: every action is completable from the keyboard with no trap

A11Y-03 | Criterion: 3.3.2 Labels or Instructions (A) — Understandable | Method: inspect the search box and each filter for a persistent programmatic label (not placeholder-only) | Pass condition: every input has an associated visible label exposed to assistive tech

A11Y-04 | Criterion: 4.1.2 Name, Role, Value (A) — Robust | Method: inspect the filter checkboxes and the search control with the accessibility tree; confirm correct role and checked/unchecked state | Pass condition: each control exposes correct name, role, and current state

A11Y-05 | Criterion: 2.5.8 Target Size Minimum (AA) — new in 2.2, Operable | Method: measure the filter checkboxes and any icon buttons (clear search, pagination) in dev tools | Pass condition: each target ≥ 24×24 CSS px or has the required spacing

A strong plan: one criterion per check, the method is concrete (a tester could run it), the pass condition is measurable, all four POUR principles appear, and at least one new-in-2.2 criterion (here 2.5.8) is included. Weak plans repeat "check it is accessible" with no number or method.

11 Self-Check

Click each question to reveal the answer.

Q1: Why is a clean automated accessibility scan not the same as an accessible page?

Automated tools only check the machine-verifiable criteria — roughly a third of WCAG. They cannot judge keyboard operability, focus order, whether alt text is meaningful, or whether errors are announced. The other two-thirds need a human running tests, so a green scan is where testing starts, not where it ends.

Q2: What do the letters POUR stand for, and how do you use them?

Perceivable, Operable, Understandable, Robust — the four principles every WCAG criterion sits under. Use them as four lenses on a component: can the user sense it (P), drive it without a mouse (O), make sense of it (U), and will it work with their assistive technology (R)? Running all four stops you fixing one issue and missing three classes.

Q3: Name two success criteria that are new in WCAG 2.2 and at Level AA.

Any two of: 2.4.11 Focus Not Obscured (Minimum), 2.5.7 Dragging Movements, 2.5.8 Target Size (Minimum), and 3.3.8 Accessible Authentication (Minimum). These are mandatory under the NZ standard and are exactly the ones a 2.1-era checklist will miss.

Q4: What does the NZ Government Web Accessibility Standard 1.2 actually require?

That public service and non-public service departments make their public-facing websites meet WCAG 2.2 Level AA. Maintained by the Department of Internal Affairs under the All-of-Government umbrella, it covers services like RealMe, Waka Kotahi online services, and Te Whatu Ora information — and conformance is claimed against the numbered criteria.

Q5: Why is “the button has an accessibility problem” a weak defect, and what makes it strong?

It names no criterion, so it cannot be verified or closed against a clause and is easy to deprioritise. A strong defect names the numbered criterion (e.g. 2.4.7 Focus Visible AA), states what was observed versus what is expected, gives reproducible steps, and attaches evidence. That is the format the NZ standard’s conformance claims are built from.

12 Interview Prep

Real questions asked in NZ QA interviews for roles that touch government and public-facing systems. Read the model answers, then practise your own version.

“Our pipeline runs an automated accessibility scan and it passes. Do we still need manual accessibility testing?”

Yes. Automated scanners cover only about a third of the WCAG success criteria — the machine-checkable ones like missing alt attributes and contrast ratios. They cannot tell you whether a custom widget works from the keyboard, whether the focus order makes sense, whether alt text is actually meaningful, or whether an error is announced to a screen reader. Those are the failures that stop a real person completing a task, like the Waka Kotahi date picker that passed the scan but locked out a blind customer. I’d treat the green scan as a baseline and do per-criterion manual testing on top, focused on operability and the assistive-technology experience.

“What level of WCAG do we have to meet for a NZ government service, and which version?”

The NZ Government Web Accessibility Standard, currently version 1.2, requires public service and non-public service departments to meet WCAG 2.2 Level AA on their public-facing sites. The two things people get wrong are the level and the version: it is AA, not A, so contrast and target-size criteria are in scope; and it is 2.2, not 2.1, so the new criteria like Focus Not Obscured, Target Size, and Accessible Authentication are mandatory. I’d make sure our checklist is built for 2.2 AA, because a 2.1 checklist will pass components the current standard fails.

“Walk me through how you’d test one component against WCAG.”

I take the component and run the four POUR lenses over it, each against specific numbered criteria. For a date picker: Perceivable — does it have a real label and sufficient contrast (1.4.3, 3.3.2)? Operable — with the mouse unplugged, can I open it and pick a date with no trap (2.1.1, 2.1.2), is focus visible (2.4.7), are the day targets at least 24 pixels (2.5.8)? Understandable — is the format stated and are errors described in text (3.3.1)? Robust — does the markup expose the right name, role, and value to a screen reader (4.1.2)? For each one I record pass or fail with a reason and capture evidence, so the result reads as a per-criterion conformance check, not an impression.