CI/CD Testing Tools Compared
GitHub Actions, Azure Pipelines, Jenkins, GitLab CI, and Bitbucket Pipelines — evaluated through a QA lens. Which one you’ll encounter depends heavily on your employer; knowing all five makes you immediately useful in any NZ team.
Why QA Engineers Need CI/CD Knowledge
CI/CD (Continuous Integration / Continuous Delivery) pipelines are the delivery mechanism for software — and therefore the delivery mechanism for your tests. A QA engineer who can only run tests locally is a bottleneck; one who can wire tests into the pipeline is a multiplier.
Tests run automatically
Every PR triggers your full test suite without anyone pressing a button. Bugs caught in minutes, not days.
You gate releases
Quality gates in the pipeline mean a failing test blocks deployment. You have real leverage, not just advisory influence.
Teams expect it
NZ employers increasingly list CI/CD literacy as a requirement, not a nice-to-have. It closes the gap between manual and automation roles.
You own flaky test triage
When the pipeline goes red, someone needs to investigate. QA engineers with pipeline access can fix it without waiting for a dev.
What a QA engineer does in CI/CD
- Add and maintain test jobs in pipeline YAML files
- Configure test result publishing so failures show up on PRs
- Set up parallel execution to keep pipelines under 10 minutes
- Write retry logic for flaky tests
- Create coverage gates that block merges below a threshold
- Store test artifacts (screenshots, videos, HTML reports) for debugging
- Monitor pipeline health and escalate systematic failures
Feature Comparison Table
Quick-reference across the five tools QA engineers encounter in NZ workplaces.
| Feature | GitHub Actions | Azure Pipelines | Jenkins | GitLab CI | Bitbucket Pipelines |
|---|---|---|---|---|---|
| Config format | YAML (.github/workflows/) |
YAML (azure-pipelines.yml) |
Groovy (Jenkinsfile) | YAML (.gitlab-ci.yml) |
YAML (bitbucket-pipelines.yml) |
| Test result publishing | Via Actions (junit-reporter, Playwright HTML) | Native Azure Test Plans integration | Plugins (JUnit, Allure, Extent) | Native JUnit/HTML artifact | Via Bamboo integration or manual upload |
| Parallel execution | Matrix strategy (built-in) | Parallel jobs / stages | Parallel stages (Declarative) | Parallel keyword (built-in) | Parallel step (built-in) |
| PR quality gates | Branch protection rules + required status checks | Branch policies + PR gates | Multibranch pipeline + webhooks | Merge request pipelines + approvals | Merge checks + Jira integration |
| Artifact retention | 90 days (configurable) | 30 days (configurable) | Manual or plugin-managed | 30 days default (configurable) | 14 days |
| Self-hosted runners | Yes (GitHub-hosted or self-hosted) | Yes (Microsoft-hosted or self-hosted agents) | Yes (primary model) | Yes (GitLab Runners) | Limited (Atlassian runners) |
| Cost model | Free for public repos; 2,000 min/month free on private | Free tier (1,800 min/month); pay-as-you-go | Free (open source); infrastructure costs only | 400 CI min/month free; paid tiers | 50 build min/month free; paid tiers |
| NZ adoption | Very high — SaaS, startups, open source | Very high — government, enterprise, Microsoft-stack teams | Medium — banks, telcos, legacy shops | Medium — government on-premise, DevSecOps | Low-medium — Jira-first Atlassian shops |
GitHub Actions
GitHub Actions is the default choice for any team whose code lives on GitHub — which in NZ means most SaaS companies, open source projects, and tech startups. It has first-class Playwright integration, a massive marketplace of pre-built actions, and a YAML format that’s easy to read and modify without deep DevOps knowledge.
Why QA engineers love it
- Playwright action:
microsoft/playwright-github-actioninstalls browsers in one step. - Matrix strategy: Run the same test suite across Chromium, Firefox, and WebKit simultaneously without duplicating config.
- PR annotations: Failed tests appear inline on the PR diff — developers see exactly what broke and where.
- Artifact uploads: Playwright videos and screenshots persist for 90 days so you can replay failures.
- Secrets management: API keys, test credentials, and environment variables stored securely in GitHub Secrets.
NZ context
Xero, Sharesies, Hnry, Timely, and most Wellington-based SaaS companies run on GitHub Actions. If you’re applying to a product company in NZ, GitHub Actions fluency is practically mandatory.
name: Playwright Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
TZ: Pacific/Auckland
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
browser: [chromium, firefox, webkit]
fail-fast: false # don't cancel other browsers if one fails
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps ${{ matrix.browser }}
- name: Run Playwright tests
run: npx playwright test --project=${{ matrix.browser }}
env:
BASE_URL: ${{ secrets.BASE_URL }}
- name: Upload test results
uses: actions/upload-artifact@v4
if: always() # upload even on failure
with:
name: playwright-results-${{ matrix.browser }}
path: playwright-report/
retention-days: 30
fail-fast: false is essential for cross-browser testing. Without it, a single Firefox failure cancels your WebKit run, hiding separate bugs.
Azure Pipelines
Azure Pipelines is Microsoft’s CI/CD offering within Azure DevOps. It’s the standard for NZ government agencies, large enterprises, and any team already in the Microsoft ecosystem (Azure, .NET, SQL Server, Microsoft 365). The New Zealand government has procured Azure as its preferred cloud platform through the All-of-Government (AoG) agreement — which means pipeline skills here translate directly to roles at agencies like MBIE, MSD, ACC, and Inland Revenue.
Azure Test Plans integration
The unique advantage of Azure Pipelines for QA engineers is the native connection to Azure Test Plans. When you publish test results in JUnit or TRX format, they automatically link to test cases in Test Plans — giving you traceability from requirement to test run without manual effort. This matters in government projects where audit trails and traceability are contractual requirements.
- PublishTestResults task: Built-in; supports JUnit, NUnit, xUnit, TRX formats.
- Test run reports: Pass/fail trends visible in Azure DevOps without external tooling.
- Branch policies: Require a minimum test pass rate before PR completion — enforced at the server level, not by convention.
- Self-hosted agents: Organisations with strict data sovereignty (common in NZ government) run agents on-premise, keeping test data within New Zealand borders.
trigger:
- main
pr:
branches:
include:
- main
variables:
TZ: Pacific/Auckland
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: Test
jobs:
- job: PlaywrightTests
strategy:
parallel: 3 # shard across 3 agents
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
- script: npm ci
displayName: 'Install dependencies'
- script: npx playwright install --with-deps chromium
displayName: 'Install Playwright browsers'
- script: |
npx playwright test \
--shard=$(System.JobPositionInPhase)/$(System.TotalJobsInPhase) \
--reporter=junit,html
displayName: 'Run Playwright tests (sharded)'
env:
BASE_URL: $(BASE_URL)
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '**/results.xml'
mergeTestResults: true
testRunTitle: 'Playwright - Shard $(System.JobPositionInPhase)'
- task: PublishBuildArtifacts@1
condition: always()
inputs:
pathToPublish: 'playwright-report'
artifactName: 'playwright-report-$(System.JobPositionInPhase)'
australiaeast is the closest; newzealandnorth is the dedicated NZ region). Check your agency’s cloud policy before using Microsoft-hosted agents for sensitive test data.
Jenkins
Jenkins is the original open-source CI/CD server, still powering a significant portion of NZ’s banking, insurance, and telecommunications pipelines. It was the dominant tool before cloud-native CI arrived, and large organisations with substantial Jenkins infrastructure have little incentive to migrate — migration risk is high and the existing pipelines work.
What you’ll encounter
- Groovy DSL: Jenkins pipelines are written in a Groovy-based DSL (Declarative or Scripted). More verbose than YAML alternatives.
- Plugin ecosystem: Almost every feature requires a plugin — JUnit reports, Allure reports, Slack notifications, SonarQube integration. This is powerful but creates maintenance overhead.
- Multibranch pipelines: Jenkins scans your repo and creates a pipeline per branch automatically — useful for large teams.
- Agent architecture: Jenkins controller + agent nodes. Self-hosted by definition — your ops team owns the infrastructure.
Maintenance overhead is real
The common criticism of Jenkins is accurate: plugin conflicts, security patches, Java version mismatches, and infrastructure costs add up. If you work somewhere running Jenkins, expect to spend time on pipeline maintenance, not just pipeline development. That’s worth understanding before you take a role.
pipeline {
agent any
environment {
TZ = 'Pacific/Auckland'
BASE_URL = credentials('base-url-secret')
}
stages {
stage('Install') {
steps {
sh 'npm ci'
sh 'npx playwright install --with-deps chromium'
}
}
stage('Test') {
parallel {
stage('Shard 1') {
steps {
sh 'npx playwright test --shard=1/3 --reporter=junit'
}
}
stage('Shard 2') {
steps {
sh 'npx playwright test --shard=2/3 --reporter=junit'
}
}
stage('Shard 3') {
steps {
sh 'npx playwright test --shard=3/3 --reporter=junit'
}
}
}
}
}
post {
always {
junit 'results/**/*.xml'
publishHTML([
reportDir: 'playwright-report',
reportFiles: 'index.html',
reportName: 'Playwright Report'
])
}
}
}
GitLab CI
GitLab CI is deeply integrated into GitLab’s platform — source control, issue tracking, CI/CD, container registry, and security scanning all live in one tool. This all-in-one approach makes it attractive for organisations that need a self-hosted, air-gapped, or data-sovereign DevSecOps platform. In NZ, this typically means government agencies with on-premise requirements and organisations with strict security posture.
Full DevSecOps pipeline
GitLab CI has built-in stages for security scanning that matter for QA engineers working in regulated environments:
- SAST (Static Application Security Testing) — built-in, no plugin required
- DAST (Dynamic Application Security Testing) — runs against a live environment
- Dependency scanning — flags vulnerable packages
- Container scanning — scans Docker images
- License compliance — flags incompatible licences
As a QA engineer, you don’t configure all of these — but you need to understand what they produce, because security findings feed into your test scope and release sign-off decisions.
variables:
TZ: "Pacific/Auckland"
stages:
- install
- test
- report
install-deps:
stage: install
image: mcr.microsoft.com/playwright:v1.44.0-jammy
script:
- npm ci
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- node_modules/
.test-template: &test-template
stage: test
image: mcr.microsoft.com/playwright:v1.44.0-jammy
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- node_modules/
policy: pull
artifacts:
when: always
paths:
- playwright-report/
- test-results/
reports:
junit: test-results/results.xml
expire_in: 30 days
test-shard-1:
<<: *test-template
script:
- npx playwright test --shard=1/3 --reporter=junit,html
variables:
BASE_URL: $BASE_URL
test-shard-2:
<<: *test-template
script:
- npx playwright test --shard=2/3 --reporter=junit,html
test-shard-3:
<<: *test-template
script:
- npx playwright test --shard=3/3 --reporter=junit,html
reports: junit: key publishes test results directly into the merge request UI with no extra plugin. Failed tests show as widgets on the MR — identical developer experience to GitHub Actions, but entirely self-hosted.
Decision Guide
Use this guide when joining a new team, evaluating a role, or recommending tooling. The right answer is almost always “what does your team already use?” — but when you have real choice, these signals point the way.
SaaS product, startup, open source, or any team already on GitHub. Fastest time-to-green for Playwright. Enormous action marketplace.
Microsoft stack (.NET, Azure, SQL Server), NZ government agency, or any team that needs native Azure Test Plans traceability.
Existing Jenkins infrastructure in a bank, telco, or large enterprise. Migrating away is expensive — maintaining and improving it is pragmatic.
On-premise or air-gapped requirement, government data sovereignty, or a team wanting a single DevSecOps platform with built-in security scanning.
Atlassian-first shop (Jira, Confluence, Bitbucket). The Jira integration is the main draw — test results link directly to Jira tickets.
NZ-specific reality
| Sector | Most common tool | Why |
|---|---|---|
| SaaS / tech startups | GitHub Actions | GitHub is where the code lives; Actions is the obvious choice |
| Central government | Azure Pipelines | AoG Microsoft agreement; Azure is the preferred cloud |
| Local government | Azure Pipelines / Jenkins | Mix depending on council size and legacy investment |
| Banking / finance | Jenkins (legacy) / Azure Pipelines (new projects) | Heavy legacy investment; new projects migrating to cloud CI |
| Telcos (Spark, One, 2degrees) | Jenkins / GitLab CI | Large legacy codebases; some on-premise DevSecOps |
| Consulting / SI firms | Varies by client | Fluency in all tools is a differentiator |
QA-Specific Pipeline Patterns
These patterns apply across all five tools. Master them and you can implement quality gates in any CI/CD environment.
Test sharding
Split your test suite across multiple parallel runners to reduce wall-clock time. Playwright has native sharding built in (--shard=N/TOTAL). The goal is keeping your pipeline under 10 minutes — anything longer and developers stop waiting for it and merge anyway.
npx playwright test --shard=1/4 # runner 1 gets 25% of tests npx playwright test --shard=2/4 # runner 2 gets next 25% # ... and so on across 4 parallel agents
Flaky test retry
Retry transient failures automatically before marking a test as failed. This reduces noise from network timeouts, race conditions, and environment blips without hiding genuine bugs. The key is distinguishing retry-on-failure (acceptable) from consistently flaky (must fix).
# playwright.config.ts
export default defineConfig({
retries: process.env.CI ? 2 : 0, // 2 retries in CI only
// local dev: no retries so failures are immediately visible
});
Track retry rate over time. If a test retries on more than 10% of runs, it’s broken — fix it rather than relying on the retry.
Coverage gates
Block merges when code coverage drops below a threshold. This requires generating a coverage report (Istanbul/nyc for JS, coverage.py for Python) and configuring the CI tool to fail when coverage falls.
# GitHub Actions example
- name: Check coverage threshold
run: |
npx jest --coverage --coverageThreshold='{"global":{"lines":80}}'
# Fails (non-zero exit) if line coverage < 80% → blocks merge
Practical advice: Start with a realistic threshold (current coverage − 5%) and ratchet it up. Setting an aspirational target on day one causes pipeline failures that teams learn to ignore.
PR result annotations
Post test results as comments or inline annotations on pull requests. This keeps developers in context — they see failures without leaving the PR. Most CI tools support this natively or via a step.
# GitHub Actions — post Playwright results as PR comment
- name: Report results to PR
uses: daun/playwright-report-summary@v3
if: always()
with:
report-file: playwright-report/results.json
comment-title: 'Playwright Test Results'
# Posts a pass/fail table directly on the PR
Artifact retention strategy
Store test artifacts (screenshots, videos, trace files, HTML reports) on failure so you can replay and debug. Don’t store them on every run — it wastes storage and costs money. Use if: always() or when: always to upload even when the step fails.
# Upload only on failure (Playwright)
- name: Upload failure artifacts
uses: actions/upload-artifact@v4
if: failure() # only runs when previous step failed
with:
name: test-failures-${{ github.run_id }}
path: |
playwright-report/
test-results/**/*.png
test-results/**/*.zip # trace files
NZ Timezone: TZ: Pacific/Auckland in All CI Configs
CI runners default to UTC. This causes subtle test failures that are painful to debug — date-dependent tests pass locally (where your machine is in NZST/NZDT) and fail in the pipeline (UTC). The fix is one line, but you have to know to add it.
Set it in every pipeline
GitHub Actions
env: TZ: Pacific/Auckland
Set at workflow level (applies to all jobs) or job level (applies to all steps in that job).
Azure Pipelines
variables: TZ: Pacific/Auckland
Set in the variables: section at the pipeline root or within a specific job.
Jenkins
environment {
TZ = 'Pacific/Auckland'
}
Set in the environment block inside pipeline {} or within a specific stage.
GitLab CI
variables: TZ: "Pacific/Auckland"
Set in the global variables: section at the top of .gitlab-ci.yml.
Pacific/Auckland rather than a hard-coded offset (UTC+12 or UTC+13) means the OS handles DST automatically — you never need to update the config when clocks change.
Beyond dates: other timezone traps in CI
- Scheduled pipelines: Cron expressions in CI tools default to UTC. A schedule of
0 9 * * 1-5means 9 AM UTC — 10 PM or 9 PM Auckland time, not 9 AM. Add a comment documenting the NZ equivalent. - Log timestamps: CI logs are UTC. When correlating pipeline logs with application logs, account for the offset. Tools like Datadog and Splunk let you pin the display timezone per user.
- Test data with dates: If your tests create records with a timestamp and then query by date, the query window may span midnight UTC incorrectly. Use your application’s timezone logic, not system time.