Mid-Level · Automation Engineer

Test Reporting & Artefacts

Q: Q1. What information must a Go/No-Go test summary always contain?

Five elements: (1) total tests run and the execution scope (browsers, environments); (2) pass rate as a percentage with absolute numbers; (3) failure breakdown separating blocking failures from known flaky tests — these are different risk levels; (4) a bug reference for each blocking failure (JIRA ID, severity, owner); and (5) a clear Go/No-Go recommendation with any conditions attached (e.g. conditional on BUG-1234 resolution by EOD). Never make the stakeholder infer the recommendation from the numbers.

Test results nobody reads don't improve quality. The art of test reporting is making results immediately actionable — for developers, leads, and stakeholders who have five minutes and need to make a decision.

Mid-Level ISTQB CTAL-TA v3.1.2 — K3 Apply ~14 min read + exercise

1 The Hook — Why This Matters

An Auckland SaaS team runs 800 Playwright tests in CI. The pass rate is 97.2%. The Test Lead is diligent — every sprint they assemble a 40-page Allure report and email it to the Product Owner. The PO skips it every single sprint. Too much detail, no clear action.

After three sprints, a P1 bug reaches production. The automated tests had caught it. The failure was right there in the Allure report, buried on page 28. Nobody triaged it. The PO didn't know failing tests could sit unremediated if nobody reviewed the dashboard.

The tests were doing their job. The reporting wasn't. A 40-page report with no executive summary is not a communication tool — it's a data dump that creates a false sense of oversight. The PO assumed a report landing in their inbox meant someone had reviewed it.

2 The Rule — The One-Sentence Version

A test report is a communication tool, not a data dump. Design it for the person who reads it, not for the person who generates it.

Three different audiences need three different formats. A developer debugging a 2am failure needs trace viewer access. A test lead needs suite-level trends. A Product Owner in a Go/No-Go meeting needs four sentences. Match the format to the audience.

3 The Analogy — Think Of It Like...

Analogy

A test report is a weather forecast, not a meteorological dataset.

The pilot doesn't need raw atmospheric pressure readings. They need: "ceiling 800ft, visibility 2nm, conditions improving at 14:00." Your stakeholder needs: "97% pass rate, 3 new failures, 2 are blocking, here's what they mean."

The meteorological dataset exists and matters — it's what produced the forecast. But handing it to the pilot unprocessed is not helpful, it's noise. Your Allure report is the dataset. Your sprint summary is the forecast.

Senior engineer insight

The shift that changed my practice was realising that a test report is read by someone with a time budget of about 90 seconds, not someone who sat through the full sprint. I started designing every report backwards — first the recommendation, then the evidence. When a TransitNZ integration project went live with a P1 still open, the post-incident review showed the PO had skimmed the Allure report and missed the failure entirely because there was no explicit risk statement at the top.

The most common mistake: generating reports that are technically complete but structurally punish the reader — no summary, no recommendation, buried failures on page 30.

4 Watch Me Do It — Three Reports for Three Audiences

Audience: Developer

Playwright HTML Report — immediate debugging

Run after any CI failure. The trace viewer gives step-by-step playback of exactly what the browser did.

CLI — open the HTML report

# After a test run, open the HTML report locally
npx playwright show-report

# CI uploads it as an artefact — download and run:
npx playwright show-report playwright-report/

Test names are the most important thing in this report. A generic name is useless at 2am:

TypeScript — behaviour-first test naming

// BAD: tells you nothing when it fails
test('test_login_01', async ({ page }) => { ... });

// GOOD: tells you exactly what scenario broke
test('KiwiSaver enrolment completes when Revenue NZ number is valid', async ({ page }) => {
  // Developer reading the failure report knows immediately:
  // the Revenue NZ number validation path broke, not the login form
});

Audience: Test Lead

Allure Report — sprint-level visibility

Allure adds suite hierarchy, JIRA story tags, and severity markers. Install it in three steps:

Shell — Allure setup

npm install allure-playwright --save-dev

# playwright.config.ts — add the reporter
reporter: [['allure-playwright']],

# Generate and open the report
npx allure generate allure-results --clean
npx allure open

Tag tests with story IDs so failures link directly to JIRA:

TypeScript — Allure labels in tests

import { test } from '@playwright/test';
import { allure } from 'allure-playwright';

test('payment gateway processes Visa card correctly', async ({ page }) => {
  allure.story('PAY-1142');
  allure.severity('critical');
  allure.feature('Payment Gateway');
  // test steps...
});

Audience: Stakeholder / Product Owner

Executive summary — Go/No-Go meeting

Four sentences. Numbers, blockers, recommendation. Here is the template:

Sprint 24 — Test Summary (example)

Sprint 24 — Test Summary: 312 automated tests ran across 3 browsers.
308 passed (98.7%). 4 failures: 2 blocked on BUG-4421 (payment gateway
timeout — known, non-blocking for this release), 1 flaky network test
(quarantined, not counted against release quality), 1 new failure in
checkout (BUG-4438 raised, P2). No release blockers as of 09:00 Thursday.

Go recommendation: conditional on BUG-4438 resolution by EOD Thursday.
Assigned to Aaron. If not resolved, escalate to release manager by 16:00.

Never send Allure to a Product Owner. It's built for test leads who want to drill into suite trends. A PO in a sprint review needs the executive summary above — not 40 pages of test case detail.

From the field

On an CoverNZ case management modernisation project, the team assumed that attaching the Allure report to the sprint Confluence page counted as "reporting to stakeholders." The test lead spent two hours a sprint assembling it. The delivery manager never opened it once — confirmed in a retrospective. The team thought the silence meant no problems; the delivery manager thought silence meant the tests were irrelevant. The fix was a six-line Slack message every Thursday morning: pass rate, blocker count, flaky count, explicit risk statement. Read rate went from 0% to 100% overnight. The lesson is not about tools — it's about meeting your audience where they are, not where you wish they were.

5 When to Use It — Audience Matrix

Report format	Audience	Trigger	Key information
Playwright HTML report	Developer	After every CI run	Which test failed, trace viewer, screenshot on failure
Allure report	Test Lead	Sprint test report	Suite hierarchy, JIRA links, severity, trend over sprints
Executive summary	Product Owner / Release Manager	Go/No-Go meeting	Pass rate, blockers, known flaky count, recommendation

Report flaky tests separately from genuine failures. A 95% pass rate with 40 flaky tests is a different risk profile to 95% with zero flaky tests. The executive summary should always call out the flaky test count as a distinct number.

6 Common Mistakes — Don't Do This

🚫 "More detail in a report means more value"

I used to think: a comprehensive Allure report with every test case documented shows professionalism. Actually: a 300-test Allure report with no summary is noise to a PO. They will skim it once, decide it takes too long to parse, and stop reading it entirely. Summarise first. Link to detail. Don't dump detail on an audience that needs a decision, not a dataset.

🚫 "Test names like test_login_01 are fine"

I used to think: sequential test IDs are standard practice. Actually: when test_login_01 fails at 2am, nobody knows what user scenario broke without reading the test code. Name tests as behaviour: login fails with expired password and shows correct error message. The Playwright HTML report surfaces test names directly — a good name is the fastest path to understanding what broke and why.

🚫 "The CI green/red indicator is enough"

I used to think: if CI is green, we're good. Actually: a pass rate of 95% with 40 flaky tests quarantined is not the same as 95% with zero flaky tests. CI can be "green" while hiding a growing pool of unstable tests that are silently excluded. Report flaky test count as a separate health metric. When that number grows, it is a warning sign the suite is degrading.

7 Now You Try — Prompt Lab

Write your answer in the box below and the AI coach will review it.

📋 Exercise — Sprint Test Summary

You are the mid-level SDET on a Chorus OSS system deployment. Your Playwright suite ran 240 tests: 231 passed, 4 failed (2 blocking P1, 2 known flaky), 5 skipped. Write the executive-facing sprint test summary (3–4 sentences) and the Go/No-Go recommendation for the release meeting.

Why teams fail here

Conflating CI status with test health — CI green means the gate passed, not that the suite is healthy. A growing flaky quarantine pool is invisible in the pipeline badge but a real signal of suite decay.
No explicit Go/No-Go recommendation — leaving stakeholders to infer a release decision from a pass percentage is abdicating the test lead's responsibility; 97% might be fine or catastrophic depending on which 3% failed.
Allure report with no story-tag discipline — if tests aren't tagged with JIRA IDs, the Allure trend view becomes useless for a test lead trying to trace which stories have fragile coverage after a major Revenue NZ API change or similar cross-team dependency shift.
Report artefacts that expire before anyone reads them — CI artefact retention set to 7 days, sprint reviews happening on day 9; the evidence is gone by the time accountability questions are asked.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer.

Q1. What information must a Go/No-Go test summary always contain?

Five elements: (1) total tests run and the execution scope (browsers, environments); (2) pass rate as a percentage with absolute numbers; (3) failure breakdown separating blocking failures from known flaky tests — these are different risk levels; (4) a bug reference for each blocking failure (JIRA ID, severity, owner); and (5) a clear Go/No-Go recommendation with any conditions attached (e.g. conditional on BUG-1234 resolution by EOD). Never make the stakeholder infer the recommendation from the numbers.

Q2. A developer asks "why does the Allure report show 94% when CI shows green?" What's a likely explanation?

CI shows green because the failing tests are tagged as flaky and quarantined — they are excluded from the pass/fail gate. Allure reports on all executed tests regardless of quarantine status, so it shows the true failure rate including flaky tests. This is actually the correct behaviour: CI should not be blocked by known-flaky tests, but the test report should surface them so the team knows the quarantine pool is growing. The difference between CI status and Allure pass rate is a signal worth monitoring.

Q3. How should flaky tests be reported differently from genuine failures?

Flaky tests should appear as a separate count in every executive summary: "X genuine failures (Y blocking), Z flaky tests quarantined." They should not block the CI gate, but they should not be invisible either. In Allure, tag them with a flaky label so they appear in their own filter. Track the flaky count as a trend over sprints — a growing flaky pool is a maintenance debt that will eventually contaminate genuine failure signals. When the count exceeds a team-agreed threshold (e.g. 5% of suite), schedule a flaky-test remediation spike.

Key takeaway

If your test report requires a reader to do work before understanding the risk, it is not a report — it is a liability waiting to be ignored at the worst possible moment.

9 ISTQB Mapping

CTAL-TA v3.1.2 — Section 6.2: Test progress and summary reports. The Technical Test Analyst is responsible for designing test reports that communicate progress and results to defined stakeholder groups. This includes selecting appropriate metrics, format, and level of detail per audience.

CTFL v4.0 — Section 5.3: Test monitoring and control — metrics reporting. Test reports must include pass/fail rates, defect counts, coverage metrics, and clear risk indicators. The standard distinguishes between progress reports (ongoing) and summary reports (end of phase or release).

10 Next Steps

You now know how to surface test results to three different audiences without losing information or burying it. Apply both skills in the practice section.

← Cross-Browser Automation CI Basics Mid-Level Practice →

Test Reporting & Artefacts

1 The Hook — Why This Matters

2 The Rule — The One-Sentence Version

3 The Analogy — Think Of It Like...

4 Watch Me Do It — Three Reports for Three Audiences

Playwright HTML Report — immediate debugging

Allure Report — sprint-level visibility

Executive summary — Go/No-Go meeting

5 When to Use It — Audience Matrix

6 Common Mistakes — Don't Do This

7 Now You Try — Prompt Lab

8 Self-Check — Can You Actually Do This?

Related techniques

9 ISTQB Mapping

10 Next Steps