Mid-Level · Automation Engineer

Test Reporting & Artefacts

Test results nobody reads don't improve quality. The art of test reporting is making results immediately actionable — for developers, leads, and stakeholders who have five minutes and need to make a decision.

Mid-Level ISTQB CTAL-TA v3.1.2 — K3 Apply ~14 min read + exercise

1 The Hook — Why This Matters

An Auckland SaaS team runs 800 Playwright tests in CI. The pass rate is 97.2%. The Test Lead is diligent — every sprint they assemble a 40-page Allure report and email it to the Product Owner. The PO skips it every single sprint. Too much detail, no clear action.

After three sprints, a P1 bug reaches production. The automated tests had caught it. The failure was right there in the Allure report, buried on page 28. Nobody triaged it. The PO didn't know failing tests could sit unremediated if nobody reviewed the dashboard.

The tests were doing their job. The reporting wasn't. A 40-page report with no executive summary is not a communication tool — it's a data dump that creates a false sense of oversight. The PO assumed a report landing in their inbox meant someone had reviewed it.

2 The Rule — The One-Sentence Version

A test report is a communication tool, not a data dump. Design it for the person who reads it, not for the person who generates it.

Three different audiences need three different formats. A developer debugging a 2am failure needs trace viewer access. A test lead needs suite-level trends. A Product Owner in a Go/No-Go meeting needs four sentences. Match the format to the audience.

3 The Analogy — Think Of It Like...

Analogy

A test report is a weather forecast, not a meteorological dataset.

The pilot doesn't need raw atmospheric pressure readings. They need: "ceiling 800ft, visibility 2nm, conditions improving at 14:00." Your stakeholder needs: "97% pass rate, 3 new failures, 2 are blocking, here's what they mean."

The meteorological dataset exists and matters — it's what produced the forecast. But handing it to the pilot unprocessed is not helpful, it's noise. Your Allure report is the dataset. Your sprint summary is the forecast.

4 Watch Me Do It — Three Reports for Three Audiences

Audience: Developer

Playwright HTML Report — immediate debugging

Run after any CI failure. The trace viewer gives step-by-step playback of exactly what the browser did.

CLI — open the HTML report
# After a test run, open the HTML report locally
npx playwright show-report

# CI uploads it as an artefact — download and run:
npx playwright show-report playwright-report/

Test names are the most important thing in this report. A generic name is useless at 2am:

TypeScript — behaviour-first test naming
// BAD: tells you nothing when it fails
test('test_login_01', async ({ page }) => { ... });

// GOOD: tells you exactly what scenario broke
test('KiwiSaver enrolment completes when IRD number is valid', async ({ page }) => {
  // Developer reading the failure report knows immediately:
  // the IRD number validation path broke, not the login form
});
Audience: Test Lead

Allure Report — sprint-level visibility

Allure adds suite hierarchy, JIRA story tags, and severity markers. Install it in three steps:

Shell — Allure setup
npm install allure-playwright --save-dev

# playwright.config.ts — add the reporter
reporter: [['allure-playwright']],

# Generate and open the report
npx allure generate allure-results --clean
npx allure open

Tag tests with story IDs so failures link directly to JIRA:

TypeScript — Allure labels in tests
import { test } from '@playwright/test';
import { allure } from 'allure-playwright';

test('payment gateway processes Visa card correctly', async ({ page }) => {
  allure.story('PAY-1142');
  allure.severity('critical');
  allure.feature('Payment Gateway');
  // test steps...
});
Audience: Stakeholder / Product Owner

Executive summary — Go/No-Go meeting

Four sentences. Numbers, blockers, recommendation. Here is the template:

Sprint 24 — Test Summary (example)
Sprint 24 — Test Summary: 312 automated tests ran across 3 browsers.
308 passed (98.7%). 4 failures: 2 blocked on BUG-4421 (payment gateway
timeout — known, non-blocking for this release), 1 flaky network test
(quarantined, not counted against release quality), 1 new failure in
checkout (BUG-4438 raised, P2). No release blockers as of 09:00 Thursday.

Go recommendation: conditional on BUG-4438 resolution by EOD Thursday.
Assigned to Aaron. If not resolved, escalate to release manager by 16:00.
Never send Allure to a Product Owner. It's built for test leads who want to drill into suite trends. A PO in a sprint review needs the executive summary above — not 40 pages of test case detail.

5 When to Use It — Audience Matrix

Report formatAudienceTriggerKey information
Playwright HTML reportDeveloperAfter every CI runWhich test failed, trace viewer, screenshot on failure
Allure reportTest LeadSprint test reportSuite hierarchy, JIRA links, severity, trend over sprints
Executive summaryProduct Owner / Release ManagerGo/No-Go meetingPass rate, blockers, known flaky count, recommendation

Report flaky tests separately from genuine failures. A 95% pass rate with 40 flaky tests is a different risk profile to 95% with zero flaky tests. The executive summary should always call out the flaky test count as a distinct number.

6 Common Mistakes — Don't Do This

🚫 "More detail in a report means more value"

I used to think: a comprehensive Allure report with every test case documented shows professionalism. Actually: a 300-test Allure report with no summary is noise to a PO. They will skim it once, decide it takes too long to parse, and stop reading it entirely. Summarise first. Link to detail. Don't dump detail on an audience that needs a decision, not a dataset.

🚫 "Test names like test_login_01 are fine"

I used to think: sequential test IDs are standard practice. Actually: when test_login_01 fails at 2am, nobody knows what user scenario broke without reading the test code. Name tests as behaviour: login fails with expired password and shows correct error message. The Playwright HTML report surfaces test names directly — a good name is the fastest path to understanding what broke and why.

🚫 "The CI green/red indicator is enough"

I used to think: if CI is green, we're good. Actually: a pass rate of 95% with 40 flaky tests quarantined is not the same as 95% with zero flaky tests. CI can be "green" while hiding a growing pool of unstable tests that are silently excluded. Report flaky test count as a separate health metric. When that number grows, it is a warning sign the suite is degrading.

7 Now You Try — Prompt Lab

Write your answer in the box below and the AI coach will review it.

📋 Exercise — Sprint Test Summary

You are the mid-level SDET on a Chorus OSS system deployment. Your Playwright suite ran 240 tests: 231 passed, 4 failed (2 blocking P1, 2 known flaky), 5 skipped. Write the executive-facing sprint test summary (3–4 sentences) and the Go/No-Go recommendation for the release meeting.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer.

Q1. What information must a Go/No-Go test summary always contain?

Five elements: (1) total tests run and the execution scope (browsers, environments); (2) pass rate as a percentage with absolute numbers; (3) failure breakdown separating blocking failures from known flaky tests — these are different risk levels; (4) a bug reference for each blocking failure (JIRA ID, severity, owner); and (5) a clear Go/No-Go recommendation with any conditions attached (e.g. conditional on BUG-1234 resolution by EOD). Never make the stakeholder infer the recommendation from the numbers.

Q2. A developer asks "why does the Allure report show 94% when CI shows green?" What's a likely explanation?

CI shows green because the failing tests are tagged as flaky and quarantined — they are excluded from the pass/fail gate. Allure reports on all executed tests regardless of quarantine status, so it shows the true failure rate including flaky tests. This is actually the correct behaviour: CI should not be blocked by known-flaky tests, but the test report should surface them so the team knows the quarantine pool is growing. The difference between CI status and Allure pass rate is a signal worth monitoring.

Q3. How should flaky tests be reported differently from genuine failures?

Flaky tests should appear as a separate count in every executive summary: "X genuine failures (Y blocking), Z flaky tests quarantined." They should not block the CI gate, but they should not be invisible either. In Allure, tag them with a flaky label so they appear in their own filter. Track the flaky count as a trend over sprints — a growing flaky pool is a maintenance debt that will eventually contaminate genuine failure signals. When the count exceeds a team-agreed threshold (e.g. 5% of suite), schedule a flaky-test remediation spike.

9 ISTQB Mapping

CTAL-TA v3.1.2 — Section 6.2: Test progress and summary reports. The Technical Test Analyst is responsible for designing test reports that communicate progress and results to defined stakeholder groups. This includes selecting appropriate metrics, format, and level of detail per audience.

CTFL v4.0 — Section 5.3: Test monitoring and control — metrics reporting. Test reports must include pass/fail rates, defect counts, coverage metrics, and clear risk indicators. The standard distinguishes between progress reports (ongoing) and summary reports (end of phase or release).