Lesson 3 of 3 · Shift-Left & Shift-Right

Continuous Testing

Q: What are the four DORA metrics and what does each measure?

Deployment Frequency (how often you deploy to production — daily or more is elite); Lead Time for Changes (commit-to-production time — under one hour is elite); Change Failure Rate (% of deployments that cause an incident — under 5% is elite); Mean Time to Restore, MTTR (how fast you recover from production failures — under one hour is elite). All four correlate with test pipeline quality.

Q: Why should fast unit tests gate PRs but slow E2E tests not gate PRs?

PR gate tests must be fast enough that developers wait for results before merging. If a gate takes 45 minutes, developers merge without waiting and the gate becomes meaningless. Fast tests (unit, API integration, smoke E2E) can realistically run in 8–10 minutes and block merges. Full E2E suites belong after merge on the main branch where a slower pipeline is acceptable because the feedback is less time-sensitive.

Q: Who should own the test pipeline design — QA or DevOps?

QA owns what runs in the pipeline and the pass criteria. DevOps owns the infrastructure the pipeline runs on. This is a shared responsibility, but the test logic — which tests, at which stage, with which failure thresholds — is a QA decision. A pipeline designed only by DevOps will optimise for infrastructure health, not business-logic correctness.

Continuous testing is not running tests continuously. It’s having the right test feedback at the right time, for the right audience, at every stage from commit to production. DORA metrics measure whether you’ve got it right.

Shift-Left & Shift-Right CTFL v4.0 — Section 2.1.5 ~35 min read · ~70 min with exercises

1 The Hook

Two NZ SaaS companies. Company A: tests run in CI take 45 minutes. Developers merge PRs without waiting for results. Failed tests are “investigated tomorrow.” Build failures accumulate. Once a month, someone spends a week fixing the backlog. Deploy frequency: once per month, emergency deploys only.

Company B: CI takes 8 minutes (fast unit and integration tests) with slower E2E running in parallel on a separate job. Test failures block the PR — you cannot merge without a green build. Fix before merge is the norm. Deploy frequency: multiple times per day.

The difference is not test suite size or engineering talent. It’s pipeline design. Company B staged their tests by speed and coverage so developers get fast feedback when they need it. Company A ran everything together and made fast feedback impossible.

DORA Deployment Frequency is a proxy for test pipeline quality. You cannot deploy frequently if your pipeline takes 45 minutes.

2 The Rule

Continuous testing means the right tests run at the right time. Unit tests run in seconds. Integration tests run in minutes. E2E tests run overnight or on demand. Each layer gives different feedback at different speed.

3 The Analogy

Analogy

Continuous testing is like a hospital triage system.

Critical emergencies get treated immediately. Moderate injuries get seen within 2 hours. Routine check-ups are scheduled. The system does not treat everything with the same urgency — it matches resources to risk level. Running 600 E2E tests on every commit is like treating every patient as a code blue regardless of their condition. It wastes resources, creates bottlenecks, and delays the feedback that actually matters.

4 Watch Me Do It

A continuous testing pipeline for a NZ SaaS product, staged by speed and coverage.

Stage 1 Pre-commit — developer’s machine, <30 seconds

Linting and type checking (ESLint, TypeScript)
Unit tests affected by the changed files only (Jest — --testPathPattern)
Optional: mutation testing on changed functions (~2 min, run when time permits)

Audience: The developer, before they push. Feedback must be fast enough to not break their flow.

Stage 2 PR Pipeline — CI, <10 minutes (blocks merge)

All unit tests (~450 tests, avg 2ms each = ~1 second)
All API integration tests (~80 tests, avg 800ms each = ~65 seconds)
Chromium-only E2E smoke test — critical path only, ~15 tests
SAST security scan (Semgrep, Snyk)
Code coverage report (fails if coverage drops below threshold)

Audience: The developer and reviewers. Must be fast enough that people wait for it before merging.

Stage 3 Main Branch Pipeline — <20 minutes (runs after merge)

Full E2E suite — all browsers (Chromium, Firefox, WebKit)
Performance tests (k6 — critical API endpoints, response time thresholds)
Accessibility tests (axe-core — full page scan, WCAG 2.2 AA ruleset)
Visual regression (Percy — screenshot comparison on key pages)

Audience: QA and the team. Failures here block deployment to production. Fix within hours, not tomorrow.

Stage 4 Production — continuous, every 5 minutes

Synthetic monitoring — critical user journeys (login, checkout, key API calls)
Alerting on error rate spikes and p95 response time degradation
Business metric monitoring (booking conversion rate, payment success rate)

Audience: On-call engineers and QA. Failures here mean production is broken right now.

DORA metrics and what they reveal about your pipeline:

Metric	What it measures	Pipeline signal
Deployment Frequency	How often you deploy to production	Low frequency = slow pipeline or no confidence in tests
Lead Time for Changes	Commit to production time	Long lead time = slow CI or manual gates
Change Failure Rate	% of deployments causing incidents	High rate = insufficient test coverage or wrong tests
MTTR	How fast you recover from failures	Long MTTR = no synthetic monitoring, slow root-cause

Pro tip: If your CI takes more than 15 minutes, developers will not wait for it. They will merge, move on, and check the results an hour later. At that point, test failures have lost their urgency. Invest in pipeline speed — parallelise tests, use test impact analysis, cache aggressively — before adding more tests. A fast pipeline with fewer tests delivers more value than a slow one with comprehensive coverage.

5 When to Use It

Continuous testing is the right investment for any team shipping software more than once a week. If you deploy less than once a month:

You do not have continuous delivery — you have a waterfall or staged-gate release model
A continuous testing pipeline is the wrong starting point. Fix the delivery model first
Monthly deployments usually indicate risk aversion, not stability. The irony: low deployment frequency means each release is larger and riskier

For NZ SaaS companies deploying weekly or more: start with Stage 1 and Stage 2. Get those right before building Stage 3. Synthetic monitoring (Stage 4) should follow naturally once deployment confidence is established.

6 Common Mistakes

🚫 “I used to think: continuous testing means running the full E2E suite on every commit.”

Actually: Running 600 E2E tests on every commit would take hours. Stage your tests by speed and coverage. Fast tests (unit, API integration) gate PRs. Slow tests (full E2E, visual regression, performance) run after merge on the main branch. Each stage has a different audience and a different acceptable time budget.

🚫 “I used to think: DORA metrics are management vanity metrics.”

Actually: DORA metrics are leading indicators of delivery performance, and they correlate strongly with test pipeline quality. Teams in the DORA ‘Elite’ category (multiple deploys per day, MTTR under an hour) have fast, reliable pipelines that give developers confidence to deploy frequently. Teams with high Change Failure Rate almost always have either insufficient coverage or tests that do not match production conditions.

🚫 “I used to think: the CI pipeline is the DevOps team’s responsibility.”

Actually: DevOps maintains the CI infrastructure — the runners, caching, secrets management. QA is responsible for what runs in the pipeline: which tests, in which order, with what pass criteria. A CI pipeline that DevOps built without QA input will monitor infrastructure metrics but miss business-logic failures. QA must own the test pipeline design even if they don’t own the CI platform.

Senior engineer insight

The most valuable shift I made was treating pipeline speed as a first-class quality metric, not an afterthought. When our PR pipeline crept from 8 minutes to 22 minutes over 18 months of adding tests, developer behaviour changed invisibly — people stopped waiting, "green enough" became acceptable, and merge queues built up. We parallelised our integration tests across four runners and got back under 9 minutes. Deployment confidence returned within a week.

Most common mistake: teams keep adding tests to the PR gate without measuring pipeline duration. Every new test is 1–2 seconds of developer attention tax. After 200 additions, you have a pipeline nobody waits for.

From the field

A Wellington-based logistics SaaS inherited a GitHub Actions pipeline that ran all 340 E2E tests on every PR — a 48-minute gate that everyone had learned to ignore. The team assumed the pipeline was comprehensive because it was slow. What they discovered during a post-incident review was that the entire E2E suite ran against a seeded SQLite database, not their PostgreSQL staging environment, so transaction-isolation bugs routinely reached production undetected. They rebuilt the pipeline in three stages: a fast Playwright smoke suite (14 critical-path tests, 4 minutes) blocking PR merges; the full E2E suite (PostgreSQL, all browsers) running on main branch after merge; and synthetic monitoring in production covering their top booking and dispatch flows. Change failure rate dropped from 18% to 4% in the first quarter. The lesson: a slow pipeline is not the same as a thorough one — environment fidelity matters as much as test count.

7 Now You Try

📋 Prompt Lab — Design a Continuous Testing Pipeline

Design a continuous testing pipeline for a NZ fintech that deploys microservices 10 times per day. The system has: 450 unit tests (avg 2ms each), 80 API integration tests (avg 800ms each), and 120 E2E tests (avg 35s each). Allocate these tests to the right pipeline stage and explain your rationale for each decision.

Why teams fail here

Treating CI duration as a DevOps problem rather than a test design problem — when the pipeline is slow, the fix is better test staging, not faster runners.
Running tests against environments that do not match production (SQLite in CI, PostgreSQL in prod; mocked third-party APIs that behave differently from the real thing) — coverage numbers look healthy but production failures keep happening.
Having no quality gate on the PR stage at all, then compensating with a heavy pre-release manual regression — this is a waterfall in disguise and destroys deployment frequency.
Monitoring infrastructure health (CPU, memory, uptime) in production but not business-logic outcomes (booking conversion rate, payment success rate, search result accuracy) — the system appears healthy while users are silently failing.

Key takeaway

A pipeline that developers wait for is a quality gate; a pipeline they have learned to ignore is just a delay — speed and environment fidelity are not optional.

8 Self-Check

Click each question to reveal the answer.

Interview Questions

What NZ hiring managers ask about continuous testing in CI/CD pipelines.

Q1. What is continuous testing and how does it differ from a traditional test phase?

Strong answer: Continuous testing is automated testing embedded in the CI/CD pipeline that runs on every code change — not as a separate phase before release. Every commit triggers: unit tests (seconds), integration tests (minutes), and acceptance tests (minutes to hours). Failures are caught within the development cycle, not weeks later. The shift from "test phase" to "continuous testing" means quality is maintained continuously rather than verified periodically. For NZ teams deploying multiple times per day, continuous testing is the only model that keeps pace with delivery.

Q2. What is a test pyramid in the context of continuous testing and why does the shape matter?

Strong answer: The test pyramid has unit tests at the base (many, fast, cheap), integration tests in the middle (fewer, slower), and E2E tests at the top (few, slowest, most expensive to maintain). The shape matters because a pyramid with a narrow base and wide top — an "ice cream cone" — means the pipeline depends on slow, brittle E2E tests for most of its coverage. This makes CI slow and unreliable. The ideal: fast unit tests provide 70-80% of coverage in seconds, integration tests cover key interactions in minutes, and E2E tests cover the most critical user journeys in 10-15 minutes.

Q3. What do you do when continuous testing reveals intermittent (flaky) test failures in CI?

Strong answer: Investigate immediately — do not add retries to mask the problem. Flaky tests indicate one of: a race condition in the test itself (test is not deterministic), a timing dependency (test is brittle to load or speed), a test isolation failure (tests share state), or a genuine intermittent application bug. Categorise each flaky test with a ticket and prioritise it. A pipeline where 10% of runs are red due to flakiness is a pipeline that developers learn to ignore — which means real failures also get ignored. The standard: a red build must always mean a real problem.

What are the four DORA metrics and what does each measure?

Deployment Frequency (how often you deploy to production — daily or more is elite); Lead Time for Changes (commit-to-production time — under one hour is elite); Change Failure Rate (% of deployments that cause an incident — under 5% is elite); Mean Time to Restore, MTTR (how fast you recover from production failures — under one hour is elite). All four correlate with test pipeline quality.

Why should fast unit tests gate PRs but slow E2E tests not gate PRs?

PR gate tests must be fast enough that developers wait for results before merging. If a gate takes 45 minutes, developers merge without waiting and the gate becomes meaningless. Fast tests (unit, API integration, smoke E2E) can realistically run in 8–10 minutes and block merges. Full E2E suites belong after merge on the main branch where a slower pipeline is acceptable because the feedback is less time-sensitive.

Who should own the test pipeline design — QA or DevOps?

QA owns what runs in the pipeline and the pass criteria. DevOps owns the infrastructure the pipeline runs on. This is a shared responsibility, but the test logic — which tests, at which stage, with which failure thresholds — is a QA decision. A pipeline designed only by DevOps will optimise for infrastructure health, not business-logic correctness.

9 ISTQB Mapping

CTFL v4.0 Section 2.1.5 — Testing in DevOps. Continuous testing is explicitly named as a DevOps testing practice. The syllabus covers the relationship between CI/CD pipelines and test automation.

CTAL Test Automation Engineer (TTA) goes deeper on pipeline architecture. CTAL-TTA Section 5 covers CI/CD integration including test stage design, parallel execution, and test result reporting. DORA metrics are not ISTQB-mapped but are the industry-standard measurement framework for DevOps performance, which includes testing outcomes.

10 Next Steps

Shift-Left & Shift-Right Hub → ← Shift-Right Testing Shift-Left Testing

Continuous Testing

1 The Hook

2 The Rule

3 The Analogy

4 Watch Me Do It

Stage 1 Pre-commit — developer’s machine, <30 seconds

Stage 2 PR Pipeline — CI, <10 minutes (blocks merge)

Stage 3 Main Branch Pipeline — <20 minutes (runs after merge)

Stage 4 Production — continuous, every 5 minutes

5 When to Use It

6 Common Mistakes

7 Now You Try

8 Self-Check

Interview Questions

Related techniques

9 ISTQB Mapping

10 Next Steps