Lesson 4 of 4 · Playwright Deep-Dive

Playwright CI & Pipelines

A Playwright suite that only runs locally is a hobby project. A properly configured CI pipeline — with sharding, smart retries, trace artefacts, and HTML report publishing — is a quality gate. This lesson builds the full pipeline.

Playwright Deep-Dive CTAL-TAE — Lesson 4 of 4 ~30 min read · ~60 min with exercises

1 The Hook

A Christchurch development team has 600 Playwright tests. Running them sequentially in CI takes 48 minutes. Pull requests sit in the queue. Developers stop waiting for CI results and merge anyway. Failed tests get noticed hours later, after the next PR has already landed on top.

The team implements sharding across 4 machines with parallel workers on each. Total CI time drops to 11 minutes. Pull requests now show test results before a second reviewer can finish reading the diff.

The team starts trusting CI again. They stop merging without it. The failure detection rate goes from “eventually” to “before merge”. That is what fast feedback does to developer behaviour.

2 The Rule

Shard your tests and run workers in parallel. A test suite that takes longer than the time between commits will be ignored. Fast CI feedback is not a nice-to-have — it changes developer behaviour.

3 The Analogy

Analogy

Sharding is like splitting a courier delivery route between 4 drivers.

One driver can deliver 600 parcels in 8 hours. Four drivers, each with different routes and no overlap, complete the same 600 deliveries in 2 hours. The parcels are identical. The parallelism is the gain.

Playwright sharding divides your test files across N machines. Each machine runs its share in parallel using workers. At the end, you merge the blob reports from all shards into a single HTML report — the full picture, from four simultaneous runs.

4 Watch Me Do It

Here is a production GitHub Actions workflow with 4-way sharding, blob report merging, and artefact publishing.

# .github/workflows/playwright.yml
name: Playwright Tests
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]  # 4 parallel shards

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Cache Playwright browsers
        uses: actions/cache@v4
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ hashFiles('package-lock.json') }}

      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium webkit

      - name: Run Playwright tests (shard ${{ matrix.shard }}/4)
        run: |
          npx playwright test \
            --shard=${{ matrix.shard }}/4 \
            --reporter=blob
        env:
          BASE_URL: ${{ secrets.STAGING_URL }}
          TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
          TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}

      - name: Upload blob report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shard }}
          path: blob-report/
          retention-days: 3

  merge-reports:
    needs: [test]
    runs-on: ubuntu-latest
    if: always()

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci

      - name: Download blob reports
        uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true

      - name: Merge into HTML report
        run: npx playwright merge-reports --reporter html ./all-blob-reports

      - name: Upload HTML report
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 14
Pro tip: fail-fast: false in the matrix strategy is critical. Without it, if one shard fails GitHub cancels the other shards immediately — you lose the partial results and the blob report from the passing shards. Keep it false so all shards complete and you can always merge a full report.

Accessing trace artefacts from a failed test: When trace: 'on-first-retry' is set in config, a failed test that retried will have a .zip trace file inside the HTML report artefact. Download the artefact, open the HTML report, click the failed test, and click “Traces”. The Playwright Trace Viewer shows every action, DOM snapshot, network request, and console log from the failing run.

Caching browser binaries (saves 2–3 minutes per run):

- name: Cache Playwright browsers
  uses: actions/cache@v4
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ hashFiles('package-lock.json') }}
    restore-keys: playwright-

The cache key changes only when package-lock.json changes — i.e. when the Playwright version changes. Every other run reuses the cached binaries.

5 When to Use It

  • Any project with more than 50 tests: configure workers and the blob reporter from the start.
  • More than 200 tests: add sharding. Calculate your target run time and work backwards to the number of shards needed.
  • Any project merging more than 5 PRs per day: fast CI directly affects how many defects escape to main.
  • Blob report merging is required whenever you shard. Without it, each shard produces a separate HTML report and you have no combined view.

6 Common Mistakes

✗ I used to think: just run npx playwright test in CI with no extra config.

Actually: without --reporter=blob on each shard, you cannot merge reports at the end. Without --shard=N/M, you are running serially on one machine. Neither takes more than 5 minutes to add, but both are commonly omitted by teams that set up CI quickly and never revisit it. The serial run becomes a bottleneck that is invisible until it is painful.

✗ I used to think: retries solve flakiness.

Actually: retries mask flakiness. Set retries: 2 in CI as a safety net to prevent a single intermittent failure from blocking a deploy. But track the retry rate separately. If more than 5% of tests are retrying regularly, that is a signal to investigate — not a reason to raise the retry count. A test that passes on the third attempt is still broken.

✗ I used to think: I should cache node_modules to speed up CI.

Actually: caching node_modules helps, but the bigger gain is caching the Playwright browser binaries at ~/.cache/ms-playwright. The browsers are 300MB+ each and take longer to download and install than your entire node_modules. Cache both, but if you are only caching one, cache the browsers.

7 Now You Try

✎ Prompt Lab — AI Exercise

Design a GitHub Actions CI pipeline for a Playwright suite with 450 tests. The suite must: run in under 15 minutes, test against a staging environment URL from a secret, publish the HTML report as a GitHub Pages deployment on main, and send a Slack notification on failure. Write the key sections of the workflow file.

8 Self-Check

Click each question to reveal the answer.

What is blob reporting and why is it needed for sharded runs?

Blob reporting writes a compact binary file containing the full test results from a single shard. When each shard runs with --reporter=blob, it produces one blob file. The merge-reports job then downloads all blob files and combines them into a single HTML report covering all shards. Without blob reporting, each shard produces its own disconnected HTML report and there is no combined view of the full suite run.

Why should browser binaries be cached separately from node_modules?

Playwright browser binaries (Chromium, WebKit, Firefox) are 200–400MB each and stored in ~/.cache/ms-playwright, not inside node_modules. Caching node_modules does not cache the browsers. A fresh CI runner that restores node_modules from cache but not the browsers still has to download and install them, which takes 2–3 minutes. Cache both paths separately using their own cache keys.

A CI run shows 8% of tests failing on retry. What does this tell you?

8% retry rate indicates a significant flakiness problem. It means roughly 1 in 12 test executions is producing a wrong result on the first attempt. This is not acceptable background noise — it means the tests are not reliable indicators of product quality. Investigate the most frequently retrying tests first: check for missing waits, shared test data causing race conditions, or environment instability. Do not raise the retry count — that hides the problem rather than solving it.

9 ISTQB Mapping

CTAL-TAE Section 6.3 — CI/CD integration of automated tests. The standard specifically addresses parallelisation, reporting, and the role of automated tests as a pipeline quality gate. Sharding is the practical implementation of parallel test execution across machines. Section 4.4 — Test automation reporting and artefacts. The blob/merge pattern is the canonical approach to producing unified reports from distributed test runs, satisfying the requirement for complete and accessible test result artefacts.