Specialised · Bug Reporting

Anatomy of a Great Bug Report

A bug report is a message to a stranger who has to fix a problem they cannot see. Everything you leave out is a round trip they have to make — or a fix that never happens. Reproducibility is the whole game.

Specialised Bug Reporting — Lesson 1 of 2 ~30 min read · ~70 min with exercises

1 The Hook

A tester on a Waka Kotahi online licence-renewal project raises a defect at 4pm: “Payment page broken, can’t renew, urgent.” She moves on. The developer picks it up the next morning, opens the payment page, renews a test licence, and it works. He sets the ticket to “Cannot Reproduce” and closes it. The tester reopens it: “It IS broken, I just saw it.” The developer tries again. Still works. Two days pass. The ticket bounces back and forth four times.

On day three they finally sit together. It turns out the failure only happens when the customer pays with a saved card whose billing address has been edited since it was saved — a real and common case. The tester had hit it; the developer never thought to. None of that was in the report. The defect was genuine, reproducible, and serious. It took three days and a meeting to find out, because the report carried none of the conditions that triggered it.

That is the cost of a thin report. Not that the bug goes unfixed forever, but that everyone spends days circling it. The tester looks careless. The developer looks defensive. The real defect — a genuine problem hitting real customers at the payment step — sits unfixed while two people argue about whether it exists.

Now picture the same defect raised differently: a clear title, the exact saved-card-with-edited-address condition, step-by-step instructions, the browser and build, what the tester expected, what actually happened, and a screenshot. The developer reproduces it on the first read and fixes it before lunch. Same bug. Same tester. The only thing that changed was the report. That gap is what this lesson closes.

2 The Rule

A bug report exists to let someone who has never seen the problem reproduce it on the first read. If a developer cannot follow your steps and watch the failure happen, the report has failed — no matter how real the bug is. Reproducibility, isolation, and a clear expected-versus-actual are not nice-to-haves; they are the report. Everything else is decoration.

3 The Analogy

Analogy

Reporting a fault to a plumber over the phone.

Ring a plumber and say “the water’s doing something weird, come fix it” and they cannot quote, cannot bring the right parts, and cannot tell whether it is urgent. They have to drive out blind and poke around. Ring and say “the hot tap in the upstairs bathroom drips about once a second, only when the shower next door is running, started Tuesday” and they know the part, the cause, and roughly the time before they leave the depot.

A developer is that plumber, and they cannot see your tap. The conditions that make it drip — the saved card, the edited address, the upstairs bathroom — are the most valuable thing you can give them. A great bug report is the second phone call: specific enough that the person on the other end already knows what they are walking into.

4 Isolation and Reproducibility

Two skills sit underneath every good report, and they come before any writing.

Reproducibility is finding the exact sequence that makes the bug happen every time. Before you raise anything, run your own steps again from a clean start. If you cannot make it happen twice, you are not ready to report — you are still investigating. A bug you can trigger reliably is worth ten you saw once.

Isolation is stripping away everything that is not part of the cause. You hit the failure after fifteen clicks; isolation is finding which three of those fifteen actually matter. Did the bug need you to be logged in as an admin, or any user? Did it need that specific RealMe account, or any account? Did it need the saved card, or any card? Each thing you can remove makes the report shorter and the cause clearer. Each thing you wrongly leave in sends the developer down a dead end.

Pro tip: Isolate by subtraction. Once you can reproduce, remove one variable at a time and try again. The smallest set of steps that still triggers the bug is the report you want — nothing extra to mislead, nothing missing to block.

These two together answer the developer’s first and only real question: “what do I do to see this myself?” A report that nails reproducibility and isolation has already won, before you have written the title.

5 The Title

The title is read more often than anything else in the report — in the backlog, in triage, in stand-up, in a release-notes list. It has one job: tell a reader what fails, where, and under what condition in a single line, without making them open the ticket.

Compare these for the same defect on an ANZ goMoney transfer screen:

🚫 “Transfer broken” — what is broken? Which transfer? In what way?
🚫 “Bug on transfer page” — says nothing a reader can act on.
✅ “Scheduled transfer to a new payee fails with ‘Invalid date’ when the date is the 1st of next month” — the reader knows the screen, the trigger, and the symptom.

A good title is testable on its own: someone could read it and go try to reproduce the bug from the title alone. If your title is so vague that it could describe twenty different defects, it is not a title yet.

6 Steps, Expected versus Actual, and Environment

The body of the report is a small, fixed structure. Every field earns its place by removing a question the developer would otherwise have to ask you.

Steps to reproduce

Numbered, from a known starting point, written so a stranger can follow them exactly. Start from “log in as a standard customer”, not from wherever you happened to be. Include the data you used — the account, the amount, the date — because the data is often the trigger. One action per step. If a step can be removed and the bug still happens, remove it (that is isolation from section 4).

Expected versus actual result

These are two separate lines, and both are mandatory. Expected states what the system should have done and, ideally, why — the acceptance criterion, the spec, the obvious correct behaviour. Actual states exactly what it did instead. The gap between them is the bug. Writing only the actual result (“it showed an error”) forces the developer to guess what you thought should happen, and they may not agree it is a defect at all.

Environment

Where you saw it: the environment (dev, test, UAT, prod), the build or version number, the browser and version or the device and OS, and the user or role. Half of all “cannot reproduce” outcomes are an environment mismatch — the developer is on the latest build and you were two behind, or you were on Safari on an iPhone and they were on Chrome on a desktop. Recording the environment turns “works on my machine” into a real clue rather than a dead end.

Evidence

A screenshot, a screen recording, or the relevant log and console output. Evidence is the proof and often the shortcut — a single screenshot of the error can replace a paragraph of description. Lesson 2 is entirely about gathering the kind of evidence that turns a contested report into an actioned one.

7 Severity versus Priority

These two are constantly confused, and the confusion causes real arguments. They measure different things, and a report should carry both.

Severity is how badly the bug damages the system when it happens — a technical, fairly objective measure. A crash is high severity. A typo is low severity. Severity is usually the tester’s call, because the tester saw the impact.

Priority is how soon it must be fixed relative to everything else — a business call about scheduling. Priority weighs severity against how many people hit it, whether there is a workaround, the release date, and reputational or regulatory exposure. Priority is usually set or agreed by a lead, product owner, or triage group, because it is a business decision, not a technical one.

High severity, low priority: the app crashes — but only on a hidden admin screen two people use once a year. Bad when it happens, rarely happens.
Low severity, high priority: the Kiwibank logo is the wrong shade of green on the homepage the day of a national brand launch. Harmless technically, urgent commercially.
High severity, high priority: KiwiSaver balances display doubled for every customer. Damaging and everywhere — fix now.
Low severity, low priority: a tooltip has a missing full stop. Note it, fix it whenever.

Pro tip: When you raise a bug, state the severity (your technical read) and suggest a priority with a one-line reason — “suggest high priority: hits every customer at the payment step, no workaround.” You are giving the business the impact facts it needs to schedule, without quietly making the scheduling decision for it.

8 The Developer’s Perspective

The fastest way to write a report that gets actioned is to picture the person who has to read it. The developer does not have your context. They did not see the screen, do not know which build you were on, and have a dozen other tickets open. Every question your report leaves unanswered becomes a message back to you — and a delay.

From their seat, a good report does three things. It lets them reproduce the failure without contacting you. It tells them where to look — the screen, the request, the moment it breaks. And it gives them the evidence to confirm they are seeing the same thing you saw, not a different problem that looks similar.

This is also why a defensive or vague report backfires. “The login is completely broken” with no steps reads as a complaint, not a report, and invites a defensive “works for me” in return. The same finding written as “login fails with this exact account, on this build, here are the steps and the console error” reads as a colleague handing over a solved investigation. Write for the developer and the report stops being an argument and becomes a hand-off.

9 A Jira/Xray-Style Markdown Template

A consistent structure means nothing gets forgotten and everyone’s reports read the same way. Here is a markdown template that drops straight into a Jira or Xray description field:

**Title:** Scheduled transfer to a new payee fails with "Invalid date" when the date is the 1st of next month

**Environment:** UAT · build 4.12.0 · Chrome 124 / Windows 11 · standard customer role

**Severity:** High  |  **Suggested priority:** High (blocks all scheduled transfers to new payees; no workaround)

**Preconditions:** Logged in as a standard customer with at least one verified account.

**Steps to reproduce:**
1. Go to Payments > Transfer.
2. Choose "Schedule for later".
3. Add a new payee (account 12-3456-7890123-00).
4. Set the transfer date to the 1st of next month.
5. Enter $50.00 and tap Review.

**Expected:** Review screen shows the scheduled transfer for the 1st of next month, ready to confirm.

**Actual:** Inline error "Invalid date" appears; the Review button does nothing. The transfer cannot be scheduled.

**Evidence:** screenshot attached (invalid-date.png); console error and failing request in Lesson 2 style attached.

**Notes / isolation:** Only fails for the 1st of a month; the 2nd onward works. Existing payees are unaffected — the new-payee path is the trigger.

Pro tip: Save this template as a ticket template or a snippet in your team’s tracker. The fields are a checklist — if you cannot fill one in, that is a signal you have not finished isolating the bug yet.

10 Common Mistakes

🚫 Raising a bug you have only seen once

Why it happens: The failure looked dramatic and you wanted to capture it before you forgot.
The fix: Reproduce it from a clean start before you write anything. If you cannot trigger it twice, say so explicitly and record everything you did — an intermittent bug is a different kind of report (covered in Lesson 2), not a normal one with the conditions missing.

🚫 Writing the actual result but not the expected result

Why it happens: The failure feels obviously wrong to you, so stating what should have happened seems redundant.
The fix: Always write both. The developer may not share your assumption about correct behaviour, and the gap between expected and actual is what makes it a defect rather than a difference of opinion. No expected line, no agreed bug.

🚫 Confusing severity with priority — or setting priority yourself

Why it happens: They feel like one “how important is this” number, and it is tempting to mark your own bug “critical”.
The fix: Severity is technical impact (your call); priority is business scheduling (a lead or product owner’s call). State severity, supply the impact facts, and suggest a priority with a reason — do not quietly decide the schedule.

🚫 Leaving out the environment

Why it happens: You know which build and browser you were on, so it feels obvious.
The fix: The developer does not know, and an environment mismatch is the single most common cause of “cannot reproduce”. Record build/version, browser or device and OS, environment (UAT/prod), and user role every time.

11 Now You Try

Three graded exercises: spot what is wrong, fix it, then build one from scratch. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot What Is Wrong

Read the weak bug report below, raised against a fictional MSD online benefit-application portal. Identify at least 4 things wrong or missing that would stop a developer reproducing or actioning it, and name the field each problem belongs to.

Title: Application broken
Description: The application page doesn’t work properly. I tried to submit and it just failed. This is critical, please fix ASAP. It’s been happening for a while now and is really annoying. Tested it earlier today.

List the problems and the field each belongs to:

Show model answer

There are at least six problems; any four well-explained earns full marks.

1. Title — "Application broken" is vague. It says nothing about what fails, where, or under what condition. A reader cannot act on it. Field: title.

2. Steps to reproduce — There are none. "I tried to submit and it just failed" is not a sequence a developer can follow from a known start point.

3. Expected vs actual — Only a vague actual ("it just failed"); no expected result and no real description of what actually happened (which error? which screen?). Field: expected vs actual.

4. Environment — Completely missing. No build, no browser/device, no environment (UAT/prod), no user role. This alone will cause a "cannot reproduce". Field: environment.

5. Severity vs priority — "Critical, fix ASAP" confuses severity with priority and the tester is setting the schedule themselves, with no impact facts to justify it. Field: severity vs priority.

6. Evidence — None attached. No screenshot, no error text, no logs. Field: evidence.

Bonus: "happening for a while" and "tested it earlier today" are vague on timing and give no exact reproduction. The report reads as a complaint, not a hand-off.

🔧 Exercise 2 of 3 — Rewrite It Properly

Rewrite the weak MSD report from Exercise 1 into a clear, reproducible bug report with these fields: Title, Environment, Severity, Suggested priority, Steps to reproduce, Expected, Actual. Invent reasonable, realistic NZ details (you are filling the gaps the original left). Make the title testable and keep severity separate from priority.

Show model answer

Title: Benefit application fails with "Unable to save" on the Income step when annual income is left blank

Environment: UAT - build 3.8.1 - Chrome 124 / Windows 11 - standard applicant role

Severity: High | Suggested priority: High (blocks submission for any applicant who leaves income blank; no workaround; common path)

Steps to reproduce:
1. Log in as a standard applicant and start a new benefit application.
2. Complete the Personal Details step and continue.
3. On the Income step, leave the "Annual income" field blank.
4. Tap Save and continue.

Expected: The form either accepts the blank field or shows an inline validation message asking the applicant to enter an income figure, and the applicant can correct it and proceed.

Actual: A generic "Unable to save" banner appears at the top of the page, the step does not advance, and no inline guidance tells the applicant what is wrong. The application cannot be submitted.

Why this is a good rewrite: the title names what fails, where, and the condition; the environment is complete; severity (technical) is separate from a suggested priority (with a reason); steps run from a known start point; and expected and actual are both present and specific. The marker is checking those properties, not the exact invented details.

🏗️ Exercise 3 of 3 — Build One From a Scenario

Write a complete bug report from the scenario below. Use the full template: Title, Environment, Severity, Suggested priority, Preconditions, Steps to reproduce, Expected, Actual, Notes/isolation.

Scenario: You are testing a fictional Auckland Council rates-payment site on the test environment, build 2.5.0, in Safari 17 on an iPhone 15. Logged in as a ratepayer, you go to pay your rates by credit card. When the rates amount has cents that round to a .x5 value (for example $1,234.85), the confirmation screen shows the wrong total — it displays $1,234.80, five cents short — and the receipt emailed afterwards also shows $1,234.80, even though the correct amount of $1,234.85 was charged to the card. Amounts that do not end in 5 cents are fine. You reproduced it three times.

Show model answer

Title: Rates payment receipt and confirmation show total 5c short when the amount ends in .x5 (e.g. $1,234.85 shown as $1,234.80)

Environment: Test - build 2.5.0 - Safari 17 / iPhone 15 (iOS) - ratepayer role

Severity: High (financial display does not match the amount actually charged) | Suggested priority: High (affects any rates total ending in 5 cents; receipts are an audit/financial record; no workaround)

Preconditions: Logged in as a ratepayer with a rates balance whose total ends in a .x5 value, e.g. $1,234.85.

Steps to reproduce:
1. Log in as a ratepayer and go to Pay rates.
2. Confirm the amount due ends in 5 cents (e.g. $1,234.85).
3. Choose credit card and proceed to the confirmation screen.
4. Complete the payment and open the emailed receipt.

Expected: The confirmation screen and the emailed receipt both show $1,234.85, matching the amount charged to the card.

Actual: The confirmation screen and the receipt both show $1,234.80 - five cents short - while the card is correctly charged $1,234.85. The displayed total does not match the charge.

Notes / isolation: Only occurs when the amount ends in .x5; amounts ending in any other cents value display correctly. Reproduced three times. Likely a rounding/truncation bug in the display formatting, not the charge itself, since the card amount is correct.

Strong reports: the title names the exact condition and symptom; environment is complete; severity is justified by the financial mismatch and kept separate from priority; steps run from a clean start; expected and actual are precise; and the isolation note pins the trigger (.x5 amounts) and observes the charge is correct while the display is wrong - a real clue for the developer.

12 Self-Check

Click each question to reveal the answer.

Q1: What single property matters most in a bug report, and why?

Reproducibility. If a developer cannot follow your steps and watch the failure happen, the report has failed regardless of how real the bug is. Everything else — title, severity, evidence — supports the goal of letting someone who has never seen the problem reproduce it on the first read.

Q2: What is the difference between isolation and reproducibility?

Reproducibility is finding the exact sequence that makes the bug happen every time. Isolation is stripping that sequence down to only the steps and conditions that actually matter — removing everything that is not part of the cause, by subtraction, until the smallest set that still triggers the bug remains.

Q3: Explain severity versus priority, and who sets each.

Severity is how badly the bug damages the system — a technical, objective measure, usually the tester’s call. Priority is how soon it must be fixed relative to everything else — a business scheduling decision, usually set by a lead or product owner. The tester states severity and suggests a priority with a reason; they do not set the schedule themselves.

Q4: Why must a report carry both an expected and an actual result?

The gap between them is the bug. The actual result alone (“it showed an error”) forces the developer to guess what you thought should happen — and they may not agree it is a defect. Stating the expected behaviour, ideally with its acceptance criterion, turns a difference of opinion into an agreed defect.

Q5: What is the most common cause of a “cannot reproduce” outcome, and how do you prevent it?

An environment mismatch — the developer is on a different build, browser, device, or user role than you were. Prevent it by recording the environment every time: build or version, browser or device and OS, the environment (UAT/prod), and the user or role you used.

13 Interview Prep

Real questions asked in NZ QA interviews. Read the model answers, then practise your own version.

“Walk me through what makes a good bug report.”

I start before I write anything: I reproduce the bug from a clean start so I know it is reliable, then isolate it by removing any step or condition that is not part of the cause. Then I write a title that names what fails, where, and under what condition, so it is actionable from the backlog alone. The body has numbered steps from a known start point, a separate expected and actual result, and a full environment line — build, browser or device, environment, and user role. I record severity as my technical read of the impact and suggest a priority with a one-line reason, leaving the scheduling call to the lead. I attach evidence: a screenshot, and in NZ banking or government work the failing request and any error output. The goal is that a developer who has never seen it can reproduce and confirm it without messaging me back.

“A developer keeps marking your bugs ‘cannot reproduce’. What do you do?”

First I treat it as a report-quality problem, not a developer problem. I check the most common cause — an environment mismatch — and make sure my reports state the exact build, browser or device, environment, and user role, because they are often on a different one. Then I re-reproduce from a clean start and tighten my steps and isolation so there is no missing precondition or hidden trigger, like a specific account or a saved card. I add evidence that proves the state I was in: a screenshot or recording, and the failing request or console error from Lesson 2. If it is genuinely intermittent I say so explicitly and switch to an evidence-gathering approach rather than pretending it is deterministic. Usually the “cannot reproduce” disappears once the report carries the condition I was leaving out.

“You found a crash on an admin screen only two people use once a year. How do you classify it?”

That is the classic high-severity, low-priority case. Severity is high because a crash badly damages the system when it happens — that is the technical impact. But priority is a business call, and the impact facts here are that almost no one hits it, it happens rarely, and there may be a workaround. So I would raise it as high severity, suggest a low or medium priority, and spell out the impact facts — two users, once a year — so the product owner can schedule it sensibly. I would never quietly mark it “critical” just because it is a crash; severity and priority are different measures and conflating them is how trivial-but-rare bugs jump the queue ahead of widespread ones.

← Bug Reporting & Evidence Next: Logs, Console & Network Evidence →