Azure DevOps for Testers
A lot of NZ government and banking work runs on Azure DevOps — and the tester who can read a pipeline, publish a test result, and set a branch policy is worth far more than one who only writes the tests. This lesson teaches you to make the pipeline enforce quality, not just run it.
1 The Hook
A fictional NZ government department, Whenua Records (a land-registry agency), ran its case-management system on Azure DevOps — the standard platform across much of the public sector here, because it sits inside the Azure tenancy that already holds the agency’s identity, hosting, and data-residency controls. The QA team had a healthy test suite: unit tests, an API contract suite, and a Playwright end-to-end pack. The pipeline ran all of them on every build, and the build was green.
Then a change that broke title-search reached production. The post-incident review found something uncomfortable: the breaking commit’s pipeline run had a failing end-to-end test. The build still went green. Why? The Playwright step ran with a shell command that swallowed the exit code, so a failed test never failed the job. On top of that, nobody published the test results, so the failure was buried in 4,000 lines of console log that no reviewer read. And there was no branch policy requiring the pipeline to pass before a pull request could merge — so even a red build could be merged by anyone with a moment’s impatience.
The tests were good. The wiring was the defect. Running a test in a pipeline is worth almost nothing if a failure does not stop the pipeline, the result is not published where a human or a gate can see it, and the merge is not blocked when the result is red. The skill this lesson teaches is not “writing tests” — it is making Azure DevOps turn a test result into an enforced decision: this change does not merge, and does not ship, unless the tests say it is safe.
Whenua Records fixed three things: the test step now fails the job on any failing test, the results are published so a reviewer sees a pass/fail summary on the pull request, and a branch policy makes a passing build a hard requirement to merge. None of that was new testing. It was making the platform enforce the testing they already had.
2 The Rule
A test only protects you if the platform acts on its result. In Azure DevOps that means three things must all be true: a failing test fails the pipeline job, the result is published so it is visible to a person and to an automated gate, and a branch policy blocks the merge and the release when the result is red. Running tests is not the goal — enforcing their verdict is. A green build with a hidden failure is more dangerous than no test at all, because it manufactures false confidence.
3 The Analogy
A warrant of fitness at the border of every change.
A WoF inspection is only useful because of what happens around the inspection. The mechanic runs the checks (the test suite), records a pass or fail on a certificate anyone can read (publishing results), and — this is the part that matters — you legally cannot register or drive the vehicle without a current pass (the branch policy and release gate). Take away the enforcement and a WoF is just a mechanic’s opinion nobody has to act on.
Azure DevOps is the testing-station regulator for your code. The pipeline runs the inspection, the test-results tab is the certificate, and the branch policy is the law that says no failing vehicle goes on the road. Whenua Records had a thorough mechanic and no law — so a vehicle that failed its inspection drove straight onto the motorway.
4 Azure Pipelines & YAML
Azure Pipelines is the engine that runs your build-and-test process. The modern way to define one is YAML — a file (azure-pipelines.yml) that lives in the repository alongside the code. That matters to a tester for one reason: the pipeline is versioned, reviewed, and tested like any other code. A change to how tests run goes through the same pull request as a change to the application, so you can review it and catch a defect in the pipeline itself — exactly the kind that bit Whenua Records.
The structure a tester needs to read
A pipeline is built from a small set of nested concepts. You do not need to author them from scratch on day one, but you must be able to read them and know where the testing lives.
main, or a pull request).Stage — a major phase, e.g. Build, Test, Deploy. Stages can have approvals between them.
Job — a unit that runs on one agent (one machine). Jobs in a stage can run in parallel.
Step / Task — a single command or a reusable action. Your test run is a step; publishing results is another.
A minimal test stage
Here is a pipeline fragment for the Whenua Records API suite. Read it for the testing, not the syntax.
branches: [ main ]
stages:
- stage: Test
jobs:
- job: ApiTests
steps:
- script: dotnet test --logger trx --results-directory $(Agent.TempDirectory)
displayName: 'Run API tests'
- task: PublishTestResults@2
condition: succeededOrFailed() # publish even when tests fail
inputs:
testResultsFormat: 'VSTest'
testResultsFiles: '**/*.trx'
Two things in there are pure testing decisions. The test command produces a machine-readable result file (a .trx), and the publish task uses condition: succeededOrFailed() so the results are published even when the tests fail — which is precisely the run you most want to see. Forget that condition and a failed run publishes nothing, exactly as it did at Whenua Records.
5 Running Tests & Publishing Results
There is a difference between a test running and a test result being visible and acted on. Running is the easy part. The value for a tester is in the second half: turning thousands of log lines into a structured pass/fail summary that Azure DevOps, a reviewer, and a gate can all read.
Publish results, not logs
The question: can a person see at a glance which tests passed and which failed, without reading the raw output? The PublishTestResults task ingests a standard result file — JUnit, NUnit, VSTest, or xUnit XML — and surfaces it in the run’s Tests tab: counts, failures with their messages, duration, and history across runs. Almost every test framework can emit one of these formats. A Playwright run produces JUnit XML; a Selenium suite via TestNG produces JUnit; pytest emits JUnit. The job of the tester is to make sure the suite writes that file and the pipeline publishes it.
Publish coverage and other evidence
The question: what evidence does the release need, beyond pass/fail? Code-coverage results (Cobertura or JaCoCo format) publish the same way and show in their own tab. For a government audit trail, the published results and coverage become the durable record that a given build was tested — far more useful than a console log that rolls off after a retention window.
succeededOrFailed(). The default is to run only when prior steps succeed — so a failing test suite would skip the publish and leave you with no result to read. The run you most need the evidence from is the one that failed, so make sure that is the one that always publishes.6 Branch Policies & PR Validation
This is where Azure DevOps stops being a test runner and becomes a quality gate. A branch policy is a rule on a protected branch (usually main) that says: changes can only arrive here through a pull request, and only if certain conditions are met. The old Azure DevOps term for the strictest version of this is a gated check-in — the change is validated before it is allowed into the branch, never after.
- Build validation: a designated pipeline must run on the pull request and pass before merge is allowed. This is the rule Whenua Records did not have. With it, a red build simply cannot be merged — the merge button is disabled.
- Required reviewers: a minimum number of approvals, and you can require a specific group — for a privacy-sensitive change to a land register, the policy can demand a security or privacy reviewer sign off.
- Comment resolution: all review comments must be resolved before merge, so a tester’s “this needs a test” comment cannot be silently ignored.
- Linked work items: the change must trace to a tracked work item, giving you the requirement-to-change traceability an auditor will ask for.
A PR validation build is the pipeline that build validation runs. It typically builds the change and runs the fast, reliable tests — unit and API — that you are willing to make every merge wait for. The point is feedback before the change lands, not after. A tester’s contribution here is deciding which tests are trustworthy and fast enough to gate every merge, and which belong to a slower pipeline that runs after merge.
7 Quality Gates, Artefacts & Environments
Beyond the merge, Azure DevOps controls how a tested build moves towards production. Three concepts matter to a tester here.
Artefacts — test once, promote that exact build
A pipeline artefact is the build output — a package, a container image, a set of binaries — published once and then promoted, unchanged, through each environment. The testing principle is the same one as containers: you test an artefact, then ship that artefact, not a rebuild. If each stage rebuilds, you tested something different from what reaches production.
Environments — approvals and checks between stages
An Azure DevOps Environment (e.g. test, staging, production) is a deploy target you can attach checks to. A pre-deployment approval can require a named person to approve before a build reaches production — common for a bank or government release where a change advisory step is mandatory. A tester is often a required approver, signing that the evidence supports promotion.
Quality gates — the automated verdict
A quality gate is an automated condition that must hold for the build to proceed: all tests passed, coverage above a threshold, no new high-severity security findings, zero failed tests in the published results. The gate reads the evidence you published and makes a yes/no decision with no human in the loop. This is the natural home for a tester’s rules — you encode “what good looks like” once, and the platform enforces it on every release.
Quality gate gates the release — protects production from a tested-but-failing build.
Environment approval gates with a person — the human sign-off a regulated release often needs.
8 Playwright, Selenium & k6 in the Pipeline
Most NZ teams do not run a single test type — they run a layered set, and each integrates into Azure Pipelines the same way: run the tool, emit a standard result file, publish it, and let the exit code fail the job.
- Playwright (end-to-end): configure the JUnit reporter so the run writes a results XML, then publish it. Playwright returns a non-zero exit code on failure, so the job fails correctly — unless a wrapper script swallows the code, which is the Whenua Records bug. Run browsers headless on the agent and shard the suite across parallel jobs to keep it fast.
- Selenium (browser, often via TestNG or pytest): emits JUnit XML the same way and publishes identically. Selenium typically needs a browser and driver on the agent; many teams run the suite against a Selenium Grid or use a hosted agent image that already has the browser, so the pipeline does not depend on a hand-built machine.
- k6 (performance): a load test belongs in the pipeline too — but as a gate, not just a run. k6 can output a JUnit-style summary and, more importantly, fail the build when a threshold is breached (for example, the 95th-percentile response time exceeds the agreed budget). That turns a performance number into an enforced quality gate, so a build that regresses latency does not pass.
The pattern is identical across all three: the tool runs as a step, writes a standard results file, a publish step surfaces it on succeededOrFailed(), and a non-zero exit code (or a breached k6 threshold) fails the job so the branch policy and quality gate can act on it. If you can do that for one tool, you can do it for any of them.
9 Azure DevOps vs GitHub Actions vs GitLab CI
A tester moving between NZ shops will meet all three. They do the same job — run a pipeline, test a change, gate a merge — but the choice usually comes down to where the organisation already lives, not which engine is “best”.
azure-pipelines.yml; first-class Test Plans and a rich Tests tab; branch policies and environment approvals are mature. Common in NZ government and banks because the work already sits in an Azure tenancy with the right data-residency and identity controls.GitHub Actions — workflow in
.github/workflows/; huge marketplace of reusable actions; great where the code already lives on GitHub. Test reporting relies more on third-party actions than a built-in test hub. Common in NZ startups, SaaS, and open-source-adjacent teams.GitLab CI — pipeline in
.gitlab-ci.yml; strong built-in test and coverage reports and merge-request pipelines; popular where a self-hosted, all-in-one platform is wanted — some NZ agencies and security-conscious orgs self-host GitLab to keep everything inside their own boundary.
For a tester the transferable truth is that the concepts map across all three: the YAML pipeline, the test step, the published result, and the merge gate exist in each — only the names and file locations change. A branch policy in Azure DevOps is branch protection in GitHub and a protected branch with merge-request approvals in GitLab. Learn the pattern once and you can read any of them.
10 Common Mistakes
🚫 Running tests in the pipeline but never failing the build on a failure
Why it happens: A wrapper script, a swallowed exit code, or a continueOnError left on a step means a failing test does not fail the job — and the build goes green.
The fix: Confirm a deliberately broken test turns the build red. The exit code of the test step must propagate to the job, exactly the Whenua Records defect. A test that cannot fail the build is decoration.
🚫 Not publishing test results, so failures hide in the log
Why it happens: The suite runs and prints to the console, and nobody adds a publish step — or adds one that skips when tests fail.
The fix: Publish a standard results file (JUnit / VSTest / NUnit) on succeededOrFailed() so the failed run is the one you can actually read, and so a quality gate has structured data to act on.
🚫 No build-validation branch policy on the protected branch
Why it happens: The pipeline exists and is green, so it feels protected — but nothing actually blocks a red build from merging.
The fix: Add a branch policy that requires the PR validation build to pass before merge. Without it, a passing pipeline is advice, not a gate — and advice gets ignored under deadline pressure.
🚫 A slow, flaky gate the team learns to route around
Why it happens: The PR validation suite includes slow end-to-end tests that flake, so people retry until green or push past the gate.
The fix: Gate every merge on fast, reliable tests only; run the slow suite after merge or on a schedule. A gate people trust and route around is worse than none, because it manufactures false confidence.
11 Now You Try
Three graded exercises: spot the pipeline risks, fix a broken test step, then design a branch policy and quality gate. Write your answer, run it for AI feedback, then compare to the model answer.
Read the description of an Azure DevOps setup at a fictional NZ Inland Revenue (IRD) tax-filing service below. Identify 3 risks in how testing is wired into the pipeline and merge process, and name what you would change for each.
The Test stage runs the suite with a shell wrapper that ends in
exit 0 so “the stage never blocks the build”. There is no PublishTestResults step — testers read the raw console log to find failures. The main branch has no build-validation policy, so pull requests can be merged whether or not the pipeline passed. The deploy stage rebuilds the application from source rather than promoting the build artefact that was tested. A k6 load test runs but only prints its numbers; it never fails the build.
List 3 risks and what you would change for each:
Show model answer
There are at least five real risks; any three well-explained earns full marks. 1. Failing tests cannot fail the build — the wrapper ends in "exit 0", so the test step always reports success. Change: let the test command's real exit code propagate to the job; verify a deliberately broken test turns the build red. 2. Results are not published — no PublishTestResults step, so failures hide in the raw log. Change: add a PublishTestResults task emitting a standard format (VSTest/JUnit/NUnit) on condition succeededOrFailed(), so failures are visible and a gate can read them. 3. No build-validation branch policy — a red build can still be merged to main. Change: add a branch policy requiring the PR validation pipeline to pass before merge. 4. Deploy rebuilds instead of promoting the tested artefact — production runs a different build from the one tested. Change: publish the build once as a pipeline artefact and promote that exact artefact through each stage. 5. k6 load test only prints numbers — a latency regression cannot stop the release. Change: set k6 thresholds and make a breach fail the build, turning the performance result into a quality gate. The trap: the pipeline is "green" and runs every test type, yet none of its results are enforced. Running tests is worthless if a failure does not fail the build, the result is invisible, and the merge and release are not gated.
The Azure Pipelines YAML below is broken in the way that lets a failing test ship. Describe a corrected Test job for a fictional Westpac NZ payments service running a Playwright end-to-end suite. Specify: how the failing test now fails the job, how results become visible, and how you would keep the gate fast.
“The Playwright step is:
- script: npx playwright test || true with displayName: 'E2E'. There is no publish step. The whole suite of 600 end-to-end tests runs in one job on every pull request, taking ~45 minutes.”
Describe the corrected Test job:
Show model answer
How the failing test now fails the job: remove the "|| true" so Playwright's non-zero exit code on a failed test propagates to the job and turns the build red. A deliberately broken test must be confirmed to fail the job. How results become visible: configure Playwright's JUnit reporter to write a results XML, then add a PublishTestResults task (testResultsFormat JUnit) with condition: succeededOrFailed() so the results appear in the Tests tab even when tests fail — counts, failing tests, messages, and history. How the gate stays fast: do not run all 600 E2E tests on every PR. Shard the suite across parallel jobs (Playwright sharding / a job matrix) so wall-clock time drops; and gate the PR on the fast, reliable subset (smoke + critical-path), running the full E2E pack after merge or on a schedule. Keep it flake-free so the team trusts the gate. Why the original was broken: (1) "|| true" swallows the exit code, so a failing test reports success and a broken payments change can merge and ship. (2) No publish step means failures are invisible and no quality gate can read them. (3) A 45-minute single-job gate is slow enough that the team will route around it — and for a payments service, a routed-around or false-green gate is exactly how a defect reaches money movement.
Design the branch-policy and release-gate rules for the main branch of a fictional Te Whatu Ora patient-portal service on Azure DevOps. Specify at least: the build-validation rule, two other branch-policy conditions, and two automated quality-gate conditions before a build reaches production — and say what each one protects against.
Show model answer
Branch policy — build validation: the PR validation pipeline (build + fast unit/API tests) must pass before merge; merge is blocked while it is red. Protects against: a failing change landing on main at all — the Whenua Records gap. Branch policy — condition 2: required reviewers, including a privacy/security reviewer for a health system holding personal information. Protects against: a privacy-impacting change merging without the right sign-off (Privacy Act 2020 / Health Information Privacy Code obligations). Branch policy — condition 3: comment resolution required and a linked work item. Protects against: a tester's "needs a test" comment being ignored, and loss of requirement-to-change traceability an auditor will ask for. Quality gate — condition 1 (before production): zero failed tests in the published results AND code coverage at or above the agreed threshold. Protects against: promoting a build whose tests failed or whose new code is untested. Quality gate — condition 2 (before production): no new high-severity security findings, and the k6 performance thresholds (e.g. p95 latency) were met. Protects against: shipping a known vulnerability or a latency regression into a patient-facing service. Strong answers also note: a pre-deployment approval on the production Environment (a named human sign-off) for a regulated health release, and promoting the tested artefact by identity rather than rebuilding. Weak answers stop at "the pipeline must be green" without separating the merge gate (branch policy) from the release gate (quality gate + approval).
12 Self-Check
Click each question to reveal the answer.
Q1: Why is running a test suite in an Azure pipeline not enough on its own?
Because a test only protects you if the platform acts on its result. Three things must all be true: a failing test fails the pipeline job, the result is published so a person and a gate can see it, and a branch policy blocks the merge and release when it is red. At Whenua Records the tests were good but a swallowed exit code, no published results, and no branch policy meant a failing build shipped — the wiring was the defect.
Q2: Why publish test results on succeededOrFailed() rather than the default?
The default publishes only when prior steps succeed, so a failing test suite would skip the publish and leave you with no result to read — the run you most need the evidence from. Setting condition: succeededOrFailed() means the failed run still publishes its structured results, so failures are visible in the Tests tab and a quality gate has data to act on.
Q3: What is the difference between a branch policy and a quality gate?
A branch policy gates the merge — it protects the branch by requiring a passing PR validation build (and reviewers, comment resolution, linked work items) before a change can land. A quality gate gates the release — an automated condition (all tests passed, coverage threshold, no high-severity findings, k6 thresholds met) that a tested build must satisfy before it reaches production. One protects the branch; the other protects production.
Q4: How do Playwright, Selenium, and k6 integrate into Azure Pipelines, and what is common to all three?
Each runs as a step, emits a standard results file (Playwright and pytest produce JUnit XML, Selenium via TestNG produces JUnit), and a publish step surfaces it on succeededOrFailed(). The common pattern is that a non-zero exit code — or a breached k6 threshold — fails the job so the branch policy and quality gate can act on it. k6 is special in that you set thresholds (e.g. p95 latency) so a performance regression itself fails the build.
Q5: Why do so many NZ government departments and banks run on Azure DevOps rather than GitHub Actions or GitLab CI?
The deciding factor is usually data residency and identity, not features. When an agency’s data and Azure Active Directory already sit inside an Azure tenancy with the right data-residency and identity controls, Azure DevOps is the path of least resistance. The three platforms share the same concepts — YAML pipeline, test step, published result, merge gate — so the choice comes down to where the organisation already lives. GitHub Actions is common in startups and SaaS; some security-conscious agencies self-host GitLab to keep everything inside their own boundary.
13 Interview Prep
Real questions asked in NZ QA interviews for DevOps-adjacent roles. Read the model answers, then practise your own version.
“Our Azure DevOps pipeline is green but a broken change still reached production. Where would you look?”
At the three things that turn a test result into an enforced decision, because that is where a green build ships a red release. First, does a failing test actually fail the job — or is there a wrapper, a swallowed exit code, or a continueOnError that lets the step pass regardless? I’d add a deliberately broken test and confirm the build goes red. Second, are results published — on succeededOrFailed() — so a reviewer and a gate can see the failure rather than hunting through the log? Third, is there a build-validation branch policy so a red build cannot be merged at all? Most “green build, bad release” stories live in one of those three gaps — the tests were fine; the platform was not enforcing them.
“How would you wire a Playwright suite into Azure Pipelines so it actually gates a merge?”
I’d run Playwright with the JUnit reporter so it writes a results XML, then add a PublishTestResults task on succeededOrFailed() so the run appears in the Tests tab even when it fails. I’d make sure no || true or wrapper swallows the exit code, so a failed test fails the job. Then I’d add a branch policy on main requiring that pipeline to pass before merge. To keep the gate usable I’d gate only the fast, reliable smoke and critical-path tests, shard them across parallel jobs to cut wall-clock time, and run the full end-to-end pack after merge — because a slow or flaky gate gets routed around, which is worse than no gate.
“A team is choosing between Azure DevOps, GitHub Actions, and GitLab CI. What would you tell them as the QA voice?”
For testing, the three are more alike than different — each has a YAML pipeline, a test step, published results, and a merge gate; only the names and file paths change. So I’d steer the decision towards where the organisation already lives and what it must comply with, not a feature checklist. In an NZ government or banking context, if the data and identity already sit in an Azure tenancy with the right data-residency controls, Azure DevOps is usually the least-friction choice, and its Test Plans and environment approvals suit a regulated release. GitHub Actions fits a team already on GitHub wanting the action marketplace; a security-conscious agency might self-host GitLab to keep everything inside its own boundary. Whichever they pick, my QA requirements are the same: a failing test fails the build, results are published, and a policy blocks merging or releasing a red build.