Specialised · Beyond Manual Testing

Usability Testing

It works. But can people actually use it? Usability testing turns designer intuition into user evidence — and prevents expensive rebuilds.

Specialised ISO 9241 — Nielsen Heuristics ~15 min read

1 The Hook

An NZ government agency spent $2.8 million rebuilding their benefits application portal. The new design was modern, responsive, and passed every functional test. It launched on a Monday. By Wednesday, the call centre was overwhelmed.

Why? Users could not find the "Submit" button. It was a subtle ghost button at the bottom of a long form. Users thought they had submitted when they had not. Others got lost in a multi-step wizard with no progress indicator. Phone hold times hit 90 minutes.

An investment of $15,000 in usability testing — five user sessions, a heuristic evaluation, and a round of fixes — would have caught every one of these issues. Instead, the agency spent six times that on post-launch remediation and reputation management.

Usability is not subjective. It is measurable, testable, and directly linked to business outcomes: conversion rates, support costs, user satisfaction, and trust.

2 The Rule

Usability testing evaluates how easily, efficiently, and satisfactorily real users can accomplish their goals with a product.

It is not about asking users what they want. It is about watching what they actually do. People are unreliable reporters of their own behaviour. Observation beats opinion.

3 The Analogy

Analogy

Watching someone try to assemble IKEA furniture without instructions.

You designed the shelf. You know where every screw goes. But when a real person opens the box, they struggle. They try to fit Panel B backwards. They cannot find the hex key. They get halfway through and realise they missed Step 3. Usability testing is watching that assembly process — not to test the person, but to test the design. Every moment of confusion is a design flaw, not a user flaw.

4 Testing Methods

Moderated User Sessions

Recruit 5-8 representative users. Give them realistic tasks (e.g., "Book a ferry from Wellington to Picton for next Tuesday"). Watch, listen, and take notes. Ask them to think aloud. The facilitator guides but does not help. This is the gold standard for qualitative insight.

Unmoderated Remote Testing

Tools like UserTesting.com, Maze, or Loop11 let users complete tasks on their own devices while recording their screen and voice. Faster and cheaper than moderated sessions, but you cannot ask follow-up questions in real time.

Heuristic Evaluation

Expert reviewers (typically 3-5) inspect the interface against established usability principles (the Nielsen heuristics). Each reviewer independently identifies issues, then the team aggregates findings. Fast, inexpensive, and catches 80% of major usability problems.

A/B Testing

Show 50% of users Version A and 50% Version B. Measure which performs better on a specific metric (conversion, click-through, completion rate). Statistical rigour is essential: you need enough users and a clear hypothesis. Tools: Optimizely, VWO, Google optimise.

Analytics and Heatmaps

Quantitative data from tools like Hotjar, Microsoft Clarity, or FullStory shows where users click, how far they scroll, and where they drop off. Heatmaps reveal dead clicks (users clicking non-interactive elements) and rage clicks (repeated frustrated clicks).

5 Nielsen's 10 Usability Heuristics

Jakob Nielsen's heuristics are the most widely used framework for expert evaluation. Every tester should know them.

1. Visibility of System Status

The system should always keep users informed about what is going on. Progress bars, loading spinners, and confirmation messages.

2. Match Between System and Real World

Use familiar language and concepts. "Shopping cart" not "transaction container." NZ English, not US English ("favour" not "favor").

3. User Control and Freedom

Users need clearly marked "exits" to reverse actions. Undo, back buttons, and cancellation options.

4. Consistency and Standards

Same words and actions should mean the same thing everywhere. Follow platform conventions (iOS Human Interface Guidelines, Material Design).

5. Error Prevention

Design that prevents problems from occurring. Confirmation dialogs for destructive actions, input masks for phone numbers.

6. Recognition Rather Than Recall

Show options rather than making users remember them. Dropdown lists, visible navigation, recently used items.

7. Flexibility and Efficiency of Use

Accelerators for experts (keyboard shortcuts, bulk actions) without confusing novices.

8. Aesthetic and Minimalist Design

No irrelevant or rarely needed information. Every extra unit of information competes with relevant units.

9. Help Users Recognise and Recover from Errors

Error messages should be in plain language, indicate the problem precisely, and suggest a solution.

10. Help and Documentation

Easy-to-search help, focused on the user's task, with concrete steps. Not a 200-page PDF.

6 System Usability Scale (SUS)

The SUS is a 10-question survey that gives you a single usability score from 0 to 100. It is technology-agnostic, quick to administer, and validated across thousands of studies.

The 10 SUS Questions
  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

Users rate each on a 5-point scale (Strongly Disagree to Strongly Agree). Odd-numbered questions contribute positively; even-numbered negatively. A score above 68 is considered above average. Above 80 is excellent.

7 NZ Context: Te Reo and Cultural Usability

In Aotearoa, usability has a cultural dimension. Government services, in particular, must serve a diverse population including Maori, Pasifika, new migrants, and rural communities.

  • Bilingual interfaces: Does the te reo version have parity with English, or is it a token translation with broken layouts?
  • Cultural concepts: Some concepts do not translate directly. Whanau-based applications may need different information architecture than individual-focused Western designs.
  • Rural connectivity: Users on satellite or 3G connections need lightweight, resilient interfaces. A usability test in Auckland fibre does not represent a farmer in the King Country.
  • Digital literacy: NZ has a significant digital divide. Interfaces assumed to be "intuitive" may alienate users with limited technology experience.

8 Common Mistakes

🚫 Asking users what they want

I used to think: Users know what they need.
Actually: Users are experts at their problems, not at solutions. Henry Ford: "If I asked people what they wanted, they would have said faster horses." Watch behaviour, not opinions.

🚫 Testing with colleagues and friends

I used to think: Anyone can give useful feedback.
Actually: Your colleagues know the product too well. They cannot simulate a first-time user. Recruit participants who match your actual user profile — age, domain knowledge, technical skill, and context of use.

🚫 Helping participants when they struggle

I used to think: I should help them finish the task.
Actually: The struggle is the data. Every wrong click, every hesitation, every "I do not know what to do now" is a finding. Intervene only when the user is genuinely stuck and frustrated, then note what caused it.

🚫 Testing at the end of the project

I used to think: Usability testing happens before launch.
Actually: The best time to test is on paper prototypes and wireframes. A change to a sketch costs nothing. A change to shipped code costs thousands. Test early, test often, test cheaply.

9 Now You Try

🎯 Heuristic Evaluation Exercise

Task: Perform a heuristic evaluation on any NZ government service website (e.g., a form on govt.nz or a department portal). Use Nielsen's 10 heuristics. For each heuristic, answer:

  1. Does the site meet this principle? (Yes / Partially / No)
  2. If No or Partially, what is the specific problem?
  3. What is the severity? (Cosmetic, Minor, Major, Catastrophic)

Goal: Find at least 5 usability issues with clear evidence. Write each as: "[Heuristic] — [Problem] — [Evidence] — [Severity]"

10 Self-Check

Click each question to reveal the answer.

Q1. What is the key difference between usability testing and a user survey?

Usability testing observes behaviour; surveys capture opinions. People often say one thing and do another. Observation is more reliable for identifying usability problems.

Q2. How many users do you need for a moderated usability test to find most major issues?

Five users. Nielsen's research shows that 5 users uncover ~85% of usability problems. Additional users yield diminishing returns. Run multiple small rounds rather than one large study.

Q3. What does a SUS score above 68 indicate?

Above-average usability. The average SUS score across all systems is 68. Scores above 80 indicate excellent usability. Below 50 suggests serious problems.

Q4. Why should you not help a participant who is struggling during a usability test?

The struggle is the data. Every moment of confusion reveals a design flaw. Helping them masks the problem. Note where they struggle, what they try, and what they say — then fix the design.