CI/CD Testing Tools Compared

1

Why QA Engineers Need CI/CD Knowledge

CI/CD (Continuous Integration / Continuous Delivery) pipelines are the delivery mechanism for software — and therefore the delivery mechanism for your tests. A QA engineer who can only run tests locally is a bottleneck; one who can wire tests into the pipeline is a multiplier.

Tests run automatically

Every PR triggers your full test suite without anyone pressing a button. Bugs caught in minutes, not days.

You gate releases

Quality gates in the pipeline mean a failing test blocks deployment. You have real leverage, not just advisory influence.

Teams expect it

NZ employers increasingly list CI/CD literacy as a requirement, not a nice-to-have. It closes the gap between manual and automation roles.

You own flaky test triage

When the pipeline goes red, someone needs to investigate. QA engineers with pipeline access can fix it without waiting for a dev.

NZ context: Most NZ software companies run either GitHub Actions (SaaS/startups) or Azure Pipelines (government/enterprise). If you work in banking or telco, you’ll likely encounter Jenkins. Understanding all three covers 95% of NZ roles.

What a QA engineer does in CI/CD

Add and maintain test jobs in pipeline YAML files
Configure test result publishing so failures show up on PRs
Set up parallel execution to keep pipelines under 10 minutes
Write retry logic for flaky tests
Create coverage gates that block merges below a threshold
Store test artifacts (screenshots, videos, HTML reports) for debugging
Monitor pipeline health and escalate systematic failures

2

Feature Comparison Table

Quick-reference across the five tools QA engineers encounter in NZ workplaces.

Feature	GitHub Actions	Azure Pipelines	Jenkins	GitLab CI	Bitbucket Pipelines
Config format	YAML (`.github/workflows/`)	YAML (`azure-pipelines.yml`)	Groovy (Jenkinsfile)	YAML (`.gitlab-ci.yml`)	YAML (`bitbucket-pipelines.yml`)
Test result publishing	Via Actions (junit-reporter, Playwright HTML)	Native Azure Test Plans integration	Plugins (JUnit, Allure, Extent)	Native JUnit/HTML artifact	Via Bamboo integration or manual upload
Parallel execution	Matrix strategy (built-in)	Parallel jobs / stages	Parallel stages (Declarative)	Parallel keyword (built-in)	Parallel step (built-in)
PR quality gates	Branch protection rules + required status checks	Branch policies + PR gates	Multibranch pipeline + webhooks	Merge request pipelines + approvals	Merge checks + Jira integration
Artifact retention	90 days (configurable)	30 days (configurable)	Manual or plugin-managed	30 days default (configurable)	14 days
Self-hosted runners	Yes (GitHub-hosted or self-hosted)	Yes (Microsoft-hosted or self-hosted agents)	Yes (primary model)	Yes (GitLab Runners)	Limited (Atlassian runners)
Cost model	Free for public repos; 2,000 min/month free on private	Free tier (1,800 min/month); pay-as-you-go	Free (open source); infrastructure costs only	400 CI min/month free; paid tiers	50 build min/month free; paid tiers
NZ adoption	Very high — SaaS, startups, open source	Very high — government, enterprise, Microsoft-stack teams	Medium — banks, telcos, legacy shops	Medium — government on-premise, DevSecOps	Low-medium — Jira-first Atlassian shops

Heads up: Bitbucket Pipelines is tightly coupled to Atlassian tooling. If your team already uses Jira + Confluence + Bitbucket, it integrates well. If not, the lock-in cost outweighs the convenience.

3

GitHub Actions

Dominant for SaaS & startups

GitHub Actions is the default choice for any team whose code lives on GitHub — which in NZ means most SaaS companies, open source projects, and tech startups. It has first-class Playwright integration, a massive marketplace of pre-built actions, and a YAML format that’s easy to read and modify without deep DevOps knowledge.

Why QA engineers love it

Playwright action: microsoft/playwright-github-action installs browsers in one step.
Matrix strategy: Run the same test suite across Chromium, Firefox, and WebKit simultaneously without duplicating config.
PR annotations: Failed tests appear inline on the PR diff — developers see exactly what broke and where.
Artifact uploads: Playwright videos and screenshots persist for 90 days so you can replay failures.
Secrets management: API keys, test credentials, and environment variables stored securely in GitHub Secrets.

NZ context

Xero, Sharesies, Hnry, Timely, and most Wellington-based SaaS companies run on GitHub Actions. If you’re applying to a product company in NZ, GitHub Actions fluency is practically mandatory.

GitHub Actions — Playwright with matrix strategy + NZ timezone

name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  TZ: Pacific/Auckland

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        browser: [chromium, firefox, webkit]
      fail-fast: false    # don't cancel other browsers if one fails

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright browsers
        run: npx playwright install --with-deps ${{ matrix.browser }}

      - name: Run Playwright tests
        run: npx playwright test --project=${{ matrix.browser }}
        env:
          BASE_URL: ${{ secrets.BASE_URL }}

      - name: Upload test results
        uses: actions/upload-artifact@v4
        if: always()    # upload even on failure
        with:
          name: playwright-results-${{ matrix.browser }}
          path: playwright-report/
          retention-days: 30

Matrix tip: fail-fast: false is essential for cross-browser testing. Without it, a single Firefox failure cancels your WebKit run, hiding separate bugs.

4

Azure Pipelines

Dominant in NZ government & enterprise

Azure Pipelines is Microsoft’s CI/CD offering within Azure DevOps. It’s the standard for NZ government agencies, large enterprises, and any team already in the Microsoft ecosystem (Azure, .NET, SQL Server, Microsoft 365). The New Zealand government has procured Azure as its preferred cloud platform through the All-of-Government (AoG) agreement — which means pipeline skills here translate directly to roles at agencies like MBIE, MSD, ACC, and Inland Revenue.

Azure Test Plans integration

The unique advantage of Azure Pipelines for QA engineers is the native connection to Azure Test Plans. When you publish test results in JUnit or TRX format, they automatically link to test cases in Test Plans — giving you traceability from requirement to test run without manual effort. This matters in government projects where audit trails and traceability are contractual requirements.

PublishTestResults task: Built-in; supports JUnit, NUnit, xUnit, TRX formats.
Test run reports: Pass/fail trends visible in Azure DevOps without external tooling.
Branch policies: Require a minimum test pass rate before PR completion — enforced at the server level, not by convention.
Self-hosted agents: Organisations with strict data sovereignty (common in NZ government) run agents on-premise, keeping test data within New Zealand borders.

Azure Pipelines — Playwright with test result publishing

trigger:
  - main

pr:
  branches:
    include:
      - main

variables:
  TZ: Pacific/Auckland

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: Test
    jobs:
      - job: PlaywrightTests
        strategy:
          parallel: 3    # shard across 3 agents
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: '20.x'

          - script: npm ci
            displayName: 'Install dependencies'

          - script: npx playwright install --with-deps chromium
            displayName: 'Install Playwright browsers'

          - script: |
              npx playwright test \
                --shard=$(System.JobPositionInPhase)/$(System.TotalJobsInPhase) \
                --reporter=junit,html
            displayName: 'Run Playwright tests (sharded)'
            env:
              BASE_URL: $(BASE_URL)

          - task: PublishTestResults@2
            condition: always()
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: '**/results.xml'
              mergeTestResults: true
              testRunTitle: 'Playwright - Shard $(System.JobPositionInPhase)'

          - task: PublishBuildArtifacts@1
            condition: always()
            inputs:
              pathToPublish: 'playwright-report'
              artifactName: 'playwright-report-$(System.JobPositionInPhase)'

NZ government note: Data sovereignty matters. Use self-hosted agents in NZ Azure regions (australiaeast is the closest; newzealandnorth is the dedicated NZ region). Check your agency’s cloud policy before using Microsoft-hosted agents for sensitive test data.

5

Jenkins

Legacy in NZ banks & telcos

Jenkins is the original open-source CI/CD server, still powering a significant portion of NZ’s banking, insurance, and telecommunications pipelines. It was the dominant tool before cloud-native CI arrived, and large organisations with substantial Jenkins infrastructure have little incentive to migrate — migration risk is high and the existing pipelines work.

What you’ll encounter

Groovy DSL: Jenkins pipelines are written in a Groovy-based DSL (Declarative or Scripted). More verbose than YAML alternatives.
Plugin ecosystem: Almost every feature requires a plugin — JUnit reports, Allure reports, Slack notifications, SonarQube integration. This is powerful but creates maintenance overhead.
Multibranch pipelines: Jenkins scans your repo and creates a pipeline per branch automatically — useful for large teams.
Agent architecture: Jenkins controller + agent nodes. Self-hosted by definition — your ops team owns the infrastructure.

Maintenance overhead is real

The common criticism of Jenkins is accurate: plugin conflicts, security patches, Java version mismatches, and infrastructure costs add up. If you work somewhere running Jenkins, expect to spend time on pipeline maintenance, not just pipeline development. That’s worth understanding before you take a role.

Jenkinsfile (Declarative) — Playwright tests with JUnit reporting

pipeline {
    agent any

    environment {
        TZ = 'Pacific/Auckland'
        BASE_URL = credentials('base-url-secret')
    }

    stages {
        stage('Install') {
            steps {
                sh 'npm ci'
                sh 'npx playwright install --with-deps chromium'
            }
        }

        stage('Test') {
            parallel {
                stage('Shard 1') {
                    steps {
                        sh 'npx playwright test --shard=1/3 --reporter=junit'
                    }
                }
                stage('Shard 2') {
                    steps {
                        sh 'npx playwright test --shard=2/3 --reporter=junit'
                    }
                }
                stage('Shard 3') {
                    steps {
                        sh 'npx playwright test --shard=3/3 --reporter=junit'
                    }
                }
            }
        }
    }

    post {
        always {
            junit 'results/**/*.xml'
            publishHTML([
                reportDir: 'playwright-report',
                reportFiles: 'index.html',
                reportName: 'Playwright Report'
            ])
        }
    }
}

Jenkins reality check: If a job posting lists Jenkins as a primary requirement in 2025, the codebase is likely mature/legacy. Clarify during interview whether you’d be maintaining existing pipelines or building new ones. These are very different roles.

6

GitLab CI

Government on-premise & DevSecOps

GitLab CI is deeply integrated into GitLab’s platform — source control, issue tracking, CI/CD, container registry, and security scanning all live in one tool. This all-in-one approach makes it attractive for organisations that need a self-hosted, air-gapped, or data-sovereign DevSecOps platform. In NZ, this typically means government agencies with on-premise requirements and organisations with strict security posture.

Full DevSecOps pipeline

GitLab CI has built-in stages for security scanning that matter for QA engineers working in regulated environments:

SAST (Static Application Security Testing) — built-in, no plugin required
DAST (Dynamic Application Security Testing) — runs against a live environment
Dependency scanning — flags vulnerable packages
Container scanning — scans Docker images
License compliance — flags incompatible licences

As a QA engineer, you don’t configure all of these — but you need to understand what they produce, because security findings feed into your test scope and release sign-off decisions.

GitLab CI — Playwright with parallel execution and artifact upload

variables:
  TZ: "Pacific/Auckland"

stages:
  - install
  - test
  - report

install-deps:
  stage: install
  image: mcr.microsoft.com/playwright:v1.44.0-jammy
  script:
    - npm ci
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/

.test-template: &test-template
  stage: test
  image: mcr.microsoft.com/playwright:v1.44.0-jammy
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/
    policy: pull
  artifacts:
    when: always
    paths:
      - playwright-report/
      - test-results/
    reports:
      junit: test-results/results.xml
    expire_in: 30 days

test-shard-1:
  <<: *test-template
  script:
    - npx playwright test --shard=1/3 --reporter=junit,html
  variables:
    BASE_URL: $BASE_URL

test-shard-2:
  <<: *test-template
  script:
    - npx playwright test --shard=2/3 --reporter=junit,html

test-shard-3:
  <<: *test-template
  script:
    - npx playwright test --shard=3/3 --reporter=junit,html

GitLab advantage: The reports: junit: key publishes test results directly into the merge request UI with no extra plugin. Failed tests show as widgets on the MR — identical developer experience to GitHub Actions, but entirely self-hosted.

7

Decision Guide

Use this guide when joining a new team, evaluating a role, or recommending tooling. The right answer is almost always “what does your team already use?” — but when you have real choice, these signals point the way.

Context

GitHub Actions

SaaS product, startup, open source, or any team already on GitHub. Fastest time-to-green for Playwright. Enormous action marketplace.

Context

Azure Pipelines

Microsoft stack (.NET, Azure, SQL Server), NZ government agency, or any team that needs native Azure Test Plans traceability.

Context

Jenkins

Existing Jenkins infrastructure in a bank, telco, or large enterprise. Migrating away is expensive — maintaining and improving it is pragmatic.

Context

GitLab CI

On-premise or air-gapped requirement, government data sovereignty, or a team wanting a single DevSecOps platform with built-in security scanning.

Context

Bitbucket Pipelines

Atlassian-first shop (Jira, Confluence, Bitbucket). The Jira integration is the main draw — test results link directly to Jira tickets.

NZ-specific reality

Sector	Most common tool	Why
SaaS / tech startups	GitHub Actions	GitHub is where the code lives; Actions is the obvious choice
Central government	Azure Pipelines	AoG Microsoft agreement; Azure is the preferred cloud
Local government	Azure Pipelines / Jenkins	Mix depending on council size and legacy investment
Banking / finance	Jenkins (legacy) / Azure Pipelines (new projects)	Heavy legacy investment; new projects migrating to cloud CI
Telcos (Spark, One, 2degrees)	Jenkins / GitLab CI	Large legacy codebases; some on-premise DevSecOps
Consulting / SI firms	Varies by client	Fluency in all tools is a differentiator

8

QA-Specific Pipeline Patterns

These patterns apply across all five tools. Master them and you can implement quality gates in any CI/CD environment.

Test sharding

Split your test suite across multiple parallel runners to reduce wall-clock time. Playwright has native sharding built in (--shard=N/TOTAL). The goal is keeping your pipeline under 10 minutes — anything longer and developers stop waiting for it and merge anyway.

npx playwright test --shard=1/4    # runner 1 gets 25% of tests
npx playwright test --shard=2/4    # runner 2 gets next 25%
# ... and so on across 4 parallel agents

Flaky test retry

Retry transient failures automatically before marking a test as failed. This reduces noise from network timeouts, race conditions, and environment blips without hiding genuine bugs. The key is distinguishing retry-on-failure (acceptable) from consistently flaky (must fix).

# playwright.config.ts
export default defineConfig({
  retries: process.env.CI ? 2 : 0,  // 2 retries in CI only
  // local dev: no retries so failures are immediately visible
});

Track retry rate over time. If a test retries on more than 10% of runs, it’s broken — fix it rather than relying on the retry.

Coverage gates

Block merges when code coverage drops below a threshold. This requires generating a coverage report (Istanbul/nyc for JS, coverage.py for Python) and configuring the CI tool to fail when coverage falls.

# GitHub Actions example
- name: Check coverage threshold
  run: |
    npx jest --coverage --coverageThreshold='{"global":{"lines":80}}'
  # Fails (non-zero exit) if line coverage < 80% → blocks merge

Practical advice: Start with a realistic threshold (current coverage − 5%) and ratchet it up. Setting an aspirational target on day one causes pipeline failures that teams learn to ignore.

PR result annotations

Post test results as comments or inline annotations on pull requests. This keeps developers in context — they see failures without leaving the PR. Most CI tools support this natively or via a step.

# GitHub Actions — post Playwright results as PR comment
- name: Report results to PR
  uses: daun/playwright-report-summary@v3
  if: always()
  with:
    report-file: playwright-report/results.json
    comment-title: 'Playwright Test Results'
    # Posts a pass/fail table directly on the PR

Artifact retention strategy

Store test artifacts (screenshots, videos, trace files, HTML reports) on failure so you can replay and debug. Don’t store them on every run — it wastes storage and costs money. Use if: always() or when: always to upload even when the step fails.

# Upload only on failure (Playwright)
- name: Upload failure artifacts
  uses: actions/upload-artifact@v4
  if: failure()     # only runs when previous step failed
  with:
    name: test-failures-${{ github.run_id }}
    path: |
      playwright-report/
      test-results/**/*.png
      test-results/**/*.zip    # trace files

9

NZ Timezone: `TZ: Pacific/Auckland` in All CI Configs

CI runners default to UTC. This causes subtle test failures that are painful to debug — date-dependent tests pass locally (where your machine is in NZST/NZDT) and fail in the pipeline (UTC). The fix is one line, but you have to know to add it.

Common failure mode: A test checks that a date displayed on screen matches “today”. Locally it’s 9 AM Monday in Auckland. In the CI runner, it’s still 9 PM Sunday UTC. The displayed date is Monday; the test expects Sunday. It fails. Mysteriously. Only in CI. Only sometimes.

Set it in every pipeline

GitHub Actions

env:
  TZ: Pacific/Auckland

Set at workflow level (applies to all jobs) or job level (applies to all steps in that job).

Azure Pipelines

variables:
  TZ: Pacific/Auckland

Set in the variables: section at the pipeline root or within a specific job.

Jenkins

environment {
  TZ = 'Pacific/Auckland'
}

Set in the environment block inside pipeline {} or within a specific stage.

GitLab CI

variables:
  TZ: "Pacific/Auckland"

Set in the global variables: section at the top of .gitlab-ci.yml.

NZ DST note: New Zealand observes daylight saving time (NZDT, UTC+13) from late September to early April, and standard time (NZST, UTC+12) the rest of the year. Using Pacific/Auckland rather than a hard-coded offset (UTC+12 or UTC+13) means the OS handles DST automatically — you never need to update the config when clocks change.

Beyond dates: other timezone traps in CI

Scheduled pipelines: Cron expressions in CI tools default to UTC. A schedule of 0 9 * * 1-5 means 9 AM UTC — 10 PM or 9 PM Auckland time, not 9 AM. Add a comment documenting the NZ equivalent.
Log timestamps: CI logs are UTC. When correlating pipeline logs with application logs, account for the offset. Tools like Datadog and Splunk let you pin the display timezone per user.
Test data with dates: If your tests create records with a timestamp and then query by date, the query window may span midnight UTC incorrectly. Use your application’s timezone logic, not system time.

CI/CD Testing Tools Compared

Why QA Engineers Need CI/CD Knowledge

Tests run automatically

You gate releases

Teams expect it

You own flaky test triage

What a QA engineer does in CI/CD

Feature Comparison Table

GitHub Actions

Why QA engineers love it

NZ context

Azure Pipelines

Azure Test Plans integration

Jenkins

What you’ll encounter

Maintenance overhead is real

GitLab CI

Full DevSecOps pipeline

Decision Guide

NZ-specific reality

QA-Specific Pipeline Patterns

Test sharding

Flaky test retry

Coverage gates

PR result annotations

Artifact retention strategy

NZ Timezone: TZ: Pacific/Auckland in All CI Configs

Set it in every pipeline

GitHub Actions

Azure Pipelines

Jenkins

GitLab CI

Beyond dates: other timezone traps in CI

NZ Timezone: `TZ: Pacific/Auckland` in All CI Configs