Senior Automation · Security in CI

Security Automation in CI

Q: Name the four scan types and the question each answers.

SAST — does our own source contain insecure patterns? SCA / dependency — are our third-party libraries known to be vulnerable? Secret scanning — has a credential been committed? DAST — can the running app be attacked from the outside?

Q: Why does DAST run later in the pipeline than SAST, SCA, and secret scanning?

Because DAST tests a running application from the outside, so it needs a deployed target — it cannot run until the app is deployed to a test environment. SAST, SCA, and secret scanning need only the source or dependency list, so they run early and fast under “fail fast, fail cheap.”

Security found in a yearly pen test is security found too late. The senior QE’s job is to push security checks left into the pipeline — so every commit is scanned, every risky change is caught, and the build is the gate. This lesson shows you what to run, where it fits, and how to keep it useful instead of noise.

Senior Automation DevSecOps — CI/CD Security Gates ~30 min read · ~70 min with exercises

1 The Hook

Kowhai Pay, a fictional NZ payments startup, took security seriously — once a year. Every February they paid a firm to run a penetration test, fixed what came back, and moved on. In between, the team shipped to production a dozen times a day, and no security check ran in any of those deploys.

One November, a developer added a new logging library to fix a tracing issue. It pulled in a transitive dependency with a known critical vulnerability — one that had a public advisory and a patched version available for months. The pipeline ran the unit tests, they passed, and the change went live. The flaw sat in production for eleven weeks until the February pen test found it. By then it had been deployed across every service.

Here is the lesson in that story: the pen test was not wrong, it was just too slow. A yearly check cannot keep up with daily deploys. The vulnerable library would have been caught the moment it was added if a dependency scan had been running in the pipeline — a check that takes seconds and would have failed the build on the spot.

Security automation in CI is the discipline of moving these checks out of the annual audit and into the pipeline, so the feedback is continuous and the build itself becomes the gate. That is what this lesson teaches: what to scan, where each scan belongs, when to fail the build, and how to keep the results trustworthy.

2 The Rule

Security checks that run yearly cannot protect software that ships daily. Push the checks into the pipeline so every change is scanned automatically, fail the build on real findings, and triage the rest — an unconfigured scanner that no one trusts is worse than no scanner at all.

3 The Analogy

Analogy

Airport security versus a single annual customs raid.

Imagine an airport that only checked bags once a year. Every other day, anyone could carry anything onto a plane, and the one big raid in February would catch a year’s worth of problems all at once — long after they had already flown. That is a yearly pen test against daily deploys. Security automation in CI is the permanent screening lane: every passenger, every bag, every time, with a fast scan that stops the dangerous items at the gate instead of discovering them later.

And just like airport screening, the value depends on tuning. A scanner that beeps at every belt buckle gets ignored, and a real weapon walks through. A security gate that floods developers with false alarms gets switched off or waved past — which is why triage is not optional, it is half the job.

Senior engineer insight

The most dangerous moment in security automation is not when a scanner finds nothing — it is when a scanner finds too much and the team learns to ignore it. I have watched pipelines where SCA gated on every Medium severity turn into a daily ritual of clicking "override" before anyone had even read the finding. Within three months the button was auto-clicked by a build bot, and a real Critical slipped through unnoticed. The gate was technically on; it was also completely useless.

What changed my thinking: treating false-positive rate as a first-class metric alongside coverage. You track code coverage to stop tests being theatrical — do the same for security findings. If suppression rate climbs above 40%, that is a signal the rules need tuning, not that the team needs more discipline.

Most common mistake: gating the build on every severity from day one, watching the team immediately route around the gate, and then declaring that "security automation doesn't work in our environment."

From the field

A Wellington-based team building an API for an CoverNZ claims processing integration assumed their dependency risk was low because they reviewed every npm package before adding it. What they hadn't modelled was transitive depth — their direct dependencies pulled in 847 indirect ones, none of which were reviewed by anyone. When SCA was first run against the repo, it surfaced eleven High findings, three of which had public proof-of-concept exploits. The team's initial reaction was that the scanner was broken, because they had "never added a vulnerable library."

What changed was framing: SCA isn't telling you that you made a mistake, it's telling you what is in the codebase right now. They set a baseline, gated hard on Critical/High for new additions, and assigned ownership of the existing eleven to two developers with a 30-day burn-down. Six weeks later the count was zero and the team had genuine visibility for the first time. The lesson that generalises: you cannot review what you cannot see, and most modern apps are 90% other people's code.

4 The Four Scan Types

Security automation in CI is not one tool — it is a small family of scans, each answering a different question. A serious pipeline runs all four.

SAST — static application security testing

The question: does our own source code contain insecure patterns? SAST reads the code without running it, looking for known-dangerous constructs — SQL built by string concatenation, unescaped user input flowing into a page, weak cryptography, hard-coded logic that bypasses auth. It runs early because it only needs the source. Its weakness is false positives: it sees a risky pattern but cannot always tell whether the path is actually reachable.

Dependency / SCA scanning

The question: are the third-party libraries we pull in known to be vulnerable? Software composition analysis (SCA) checks every direct and transitive dependency against public vulnerability advisories. This is the Kowhai Pay failure — the risk was not in code anyone wrote, it was in a library someone added. SCA is one of the highest-value, lowest-cost scans you can run, because most modern apps are mostly other people’s code.

Secret scanning

The question: has anyone committed a credential into the repository? Secret scanning hunts for API keys, tokens, passwords, and private keys accidentally hard-coded into source or config. A leaked AWS key or a database password in a commit is a direct path to a breach. This scan should run on every commit and, ideally, as a pre-commit hook as well — once a secret hits the git history it must be treated as compromised and rotated, even after it is removed.

DAST — dynamic application security testing

The question: when the app is actually running, can it be attacked? DAST tests the deployed, running application from the outside — the way an attacker would. A common pipeline entry point is the ZAP baseline scan: it spiders a running app and runs passive checks for issues like missing security headers, cookies without secure flags, and exposed endpoints, fast enough to sit in a pipeline. DAST needs a deployed target, so it runs later than the others, against a test environment.

Pro tip: A simple way to keep the four straight: SAST and secret scanning read the code, SCA reads the dependency list, and DAST pokes the running app. The first three need no deployment and run early and fast; DAST needs something running and sits further down the pipeline.

5 Where Each Scan Fits in the Pipeline

The order matters. Cheap, fast checks that need no deployment go first, so a bad change fails in seconds rather than after a ten-minute build and deploy. The principle is “fail fast, fail cheap” — the same one that governs ordering unit tests before end-to-end tests.

Stage            Scan                When it runs                 Why here

————————————————————————————————————————————————

Pre-commit       Secret scan         Before code leaves laptop    Stops a key reaching the repo at all

Commit / PR      Secret scan         On every push                Catches what slipped past pre-commit

Commit / PR      SAST                On every push                Needs only source; fast feedback

Build            SCA / dependency    After dependency resolve     Now the full dependency tree is known

Post-deploy      DAST (ZAP baseline) After deploy to test env     Needs a running target to attack

Two practical points. First, put the scans that need nothing but source — secret scanning and SAST — as early as possible, ideally on the pull request, so a reviewer sees the result before merge. Second, the DAST baseline scan runs against a deployed test environment, never against production, and never against an environment with real customer data in it.

Pro tip: Run secret scanning in two places — a pre-commit hook on the developer’s machine and again in the pipeline. The hook gives instant feedback and stops most leaks; the pipeline check is the backstop for developers who skipped the hook. Defence in two layers, because a secret in git history is expensive to clean up.

6 Failing the Build on Findings

A scanner that only writes a report no one reads is theatre. The point of putting security in CI is that the pipeline can stop — a real finding fails the build and the change does not ship until it is dealt with. But fail on everything and the team drowns; fail on nothing and you are back to a report no one reads. The art is the threshold.

The usual approach is severity-gated. Decide, as a team and written down, which severities break the build:

Critical and High: fail the build. A known-exploitable dependency or a hard-coded secret should never reach production silently.
Medium: warn and track, often allowed to pass but raised as a ticket with an owner and a due date, so it does not accumulate forever.
Low / informational: logged for visibility, not a gate.

Two refinements keep this workable. A baseline lets you adopt scanning on an existing app without the first run failing on hundreds of pre-existing findings — you record the current state as accepted, and the gate only fails on new findings introduced by this change. And a documented, time-boxed exception process (often called a suppression or waiver, with an expiry date and an approver) handles the rare case where a High finding is genuinely accepted as a risk — with a name against it, not a silent skip.

Pro tip: The fastest way to get a security gate switched off is to turn it on at full strictness against a legacy codebase on day one. Start with a baseline so only new problems fail the build, gate hard on Critical/High and secrets, and tighten over time. A gate the team trusts and keeps is worth more than a strict one they bypass.

7 Triaging False Positives

Every security scanner produces false positives — findings that are technically flagged but not a real risk in your context. SAST is the worst offender: it sees a dangerous pattern but cannot always tell whether user input ever actually reaches it. If you treat every finding as a real bug, developers learn the gate cries wolf and stop trusting it. Triage is how you keep the signal high.

A simple, repeatable triage for each finding:

1. Is it real? Trace the finding. For a SAST SQL-injection alert, does untrusted input actually reach that query, or is the value a hard-coded constant? → If not reachable, it is a false positive.
2. Is it exploitable here? A vulnerable function in a dependency you import but never call may not be exploitable in your usage. → Still patch when you can, but it may not warrant breaking the build today.
3. If false, suppress it — visibly. Record the suppression in config with a reason and an author, so the next person sees why it was dismissed. Never silence a whole rule to kill one false positive.
4. If real, fix or risk-accept. Patch the dependency, fix the code, or raise a documented, time-boxed exception with an owner.

The discipline that separates a senior QE here: a false positive is suppressed with a recorded reason, never by disabling the rule wholesale. Turning off the rule to clear one noisy alert blinds you to every real instance of that rule forever — the single most common way a security gate quietly stops protecting anything.

8 Audit-Ready Evidence

Security in CI produces something a yearly pen test cannot: a continuous, dated record that every change was scanned. For NZ teams handling payments, health, or government data, that record is exactly what an auditor or a customer’s security questionnaire asks for. Make it deliberate:

Scan results retained per build: the SAST, SCA, secret, and DAST reports stored against the build that produced them, not just the latest run.
The gate decision recorded: what severities fail the build, captured in pipeline config under version control, so the policy itself is auditable.
Suppressions and exceptions traceable: every dismissed finding has a reason, an author, and (for risk-accepted Highs) an expiry and approver.
Trend over time: findings opened versus closed, so you can show the security debt is shrinking, not quietly growing.

This is the bridge between security and the CI/CD work you already do. The pipeline is not just where tests run — it is the system of record that proves, build by build, that security was checked. “We scan on every deploy and here are the reports” is a far stronger answer than “we had a pen test in February.”

Pro tip: The single highest-value security automation for most NZ teams is dependency (SCA) scanning gated on Critical/High, plus secret scanning on every commit. They are cheap to add, catch the two most common and most damaging classes of real-world breach, and produce exactly the evidence a security questionnaire asks for.

9 Common Mistakes

🚫 Running the scan but never failing the build

Why it happens: Teams add a scanner that writes a report and feel covered, without wiring it to the exit code.
The fix: A report no one reads protects nothing. Gate the build on Critical/High and secrets so a real finding actually stops the deploy. The pipeline must be able to say no.

🚫 Turning on full strictness against a legacy codebase on day one

Why it happens: Strict feels responsible, and the first scan looks alarming, so the instinct is to block everything.
The fix: Hundreds of pre-existing findings on the first run get the gate switched off within a week. Record a baseline so only new findings fail the build, then tighten over time.

🚫 Killing a false positive by disabling the whole rule

Why it happens: One noisy alert is annoying, and turning off the rule makes it go away instantly.
The fix: Disabling the rule blinds you to every real instance of it from then on. Suppress the single finding with a recorded reason and author, and leave the rule active.

🚫 Running DAST against production or real data

Why it happens: Production is the “real” environment, so it seems like the most honest place to scan.
The fix: An active scan can corrupt data, trip alerts, and breach customer privacy. Run the DAST baseline against a deployed test environment with no real customer data, never against production.

10 Now You Try

Three graded exercises across the scan types and the pipeline. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Gaps

Read the description of a fictional NZ insurance API team’s pipeline below. Identify 3 security automation gaps and name the scan type that would close each.

Pipeline: Tui Cover claims API
The team deploys to production several times a day. The pipeline runs unit tests and integration tests, and that is all that gates a deploy. Dependencies are pulled fresh on every build and updated whenever a developer feels like it; no one checks them against advisories. A developer once committed a database password to fix something quickly and removed it the next day. Security is covered by a penetration test booked once a year. The running app has never been scanned for missing security headers or exposed endpoints.

List 3 gaps and the scan type for each:

Show model answer

There are at least four real gaps; any three well-explained earns full marks.

1. SCA / dependency scanning — Dependencies are pulled fresh and never checked against advisories. A known-vulnerable library could ship to production unnoticed (the classic transitive-dependency breach). Add SCA gated on Critical/High at the build stage.

2. Secret scanning — A password was committed and only removed the next day; nothing scans for credentials. Once a secret is in git history it must be rotated, not just deleted. Add secret scanning on every commit and ideally a pre-commit hook.

3. DAST (ZAP baseline) — The running app has never been scanned for missing security headers or exposed endpoints. Add a baseline DAST scan post-deploy against a test environment.

Bonus gap: SAST — the team's own source is never scanned for insecure patterns (e.g. SQL built by concatenation). Add SAST on the pull request.

The theme: a yearly pen test cannot keep up with several deploys a day. Every gap above is a check that belongs in the pipeline, running on each change.

🔧 Exercise 2 of 3 — Fix the Gate Policy

The build-failure policy below is unworkable. Rewrite it into a sensible severity-gated policy for a fictional NZ government services portal that is adopting scanning on an existing, legacy codebase. Cover: which severities fail the build, how to handle existing findings, the false-positive process, and the exception process.

Original (unworkable):
“Fail the build on any finding of any severity. If there are too many, disable the rules that fire most.”

Rewrite as a workable gate policy:

Show model answer

Build-failure thresholds: Fail the build on Critical and High findings and on any detected secret. Medium = warn and raise a tracked ticket with an owner and due date (build passes). Low / informational = logged for visibility, not a gate.

Existing / pre-existing findings: Record a baseline of the current state as accepted on adoption, so the first run does not fail on hundreds of legacy findings. The gate fails only on NEW findings introduced by a change. Burn down the baseline over time as a separate, tracked piece of work.

False-positive handling: Triage each finding — is it reachable / exploitable here? If genuinely false, suppress that single finding in config with a recorded reason and author. Never disable the whole rule to silence one alert, because that hides every real future instance.

Exception (risk-accept) process: For a real High that the business chooses to accept, raise a documented, time-boxed waiver with an approver and an expiry date — never a silent skip. It reappears at expiry for review.

Why this works: it gates hard on what actually matters (Critical/High + secrets), it lets a legacy codebase adopt scanning without the gate being switched off, and it keeps every suppression and exception named and auditable. The original failed on all three: gating on everything is unworkable, and disabling noisy rules blinds you to real issues.

🏗️ Exercise 3 of 3 — Order the Pipeline

Design the security stages of a CI/CD pipeline for a fictional NZ fintech mobile-banking backend that deploys several times a day. For each of the 5 stages, give: the stage, the scan(s) that run there, and a one-line reason for the placement. Cover secret scanning, SAST, SCA, and a ZAP baseline DAST scan.

Show model answer

Stage 1 | Scan(s): Secret scan (pre-commit hook) | Reason: Stop a credential before it ever leaves the developer's machine and enters git history.

Stage 2 | Scan(s): Secret scan + SAST (on the pull request) | Reason: Both need only source, so they run fast and a reviewer sees results before merge; secret scan here is the backstop for anyone who skipped the hook.

Stage 3 | Scan(s): SCA / dependency scan (build stage) | Reason: After dependencies resolve, the full direct-and-transitive tree is known and can be checked against advisories; gate on Critical/High.

Stage 4 | Scan(s): Deploy to test environment | Reason: DAST needs a running target, so the app must be deployed to a non-production test env (no real customer data) first.

Stage 5 | Scan(s): ZAP baseline DAST scan (post-deploy) | Reason: With the app running, scan it from the outside for missing security headers, insecure cookies, and exposed endpoints.

Strong plans follow "fail fast, fail cheap": the checks that need only source (secret, SAST) run earliest, SCA runs once dependencies are resolved, and DAST runs last because it needs a deployed target — and never against production. Weak plans put DAST early (impossible, nothing is running) or never fail the build.

Why teams fail here

The scanner runs but the exit code is never checked. Tools like Trivy and Semgrep return a non-zero exit on findings — but only if you wire them that way. Teams add the scan step, see the report in the logs, and assume the build will fail. It doesn't, until someone explicitly sets --exit-code 1 or equivalent. A pipeline that scans but never fails is a compliance prop, not a security control.
DAST pointed at production data. Revenue NZ and CoverNZ integrations in particular tempt teams to run the "real" scan against the environment with real tax or claims records. A ZAP active scan will probe endpoints aggressively, can corrupt records, and almost certainly violates the Privacy Act 2020 terms under which that data was collected. DAST goes against a seeded test environment, always.
Secret scanning added after a secret is already in history. Teams often add secret scanning the day after a credential leak is discovered. The scanner now shows green on every new commit — but the leaked key is still in git history on every branch that branched before the removal commit. The credential must be rotated, not just deleted. Green scan results on new commits do not retroactively clean history.
Suppressing findings at the rule level instead of the finding level. One noisy SAST rule fires on a pattern the team uses deliberately — a parameterised query builder that looks like string concatenation to static analysis. The fastest fix is disable-rule: sql-injection. That rule now never fires on anyone's code, ever, including the developer who joins six months later and actually does build a query by concatenation. Suppress the specific finding with a comment explaining why; leave the rule active.

Key takeaway

A security gate the team has learned to click past is not a gate — it is a ritual, and the senior engineer's real job is building one that stays trusted long enough to actually stop something.

11 Self-Check

Click each question to reveal the answer.

Q1: Why is a yearly penetration test not enough for a team that deploys daily?

Because a yearly check cannot keep up with daily change — a vulnerability introduced the day after the pen test can sit in production for months before the next one finds it. Security automation in CI runs the checks on every change, so feedback is continuous and the build itself is the gate.

Q2: Name the four scan types and the question each answers.

SAST — does our own source contain insecure patterns? SCA / dependency — are our third-party libraries known to be vulnerable? Secret scanning — has a credential been committed? DAST — can the running app be attacked from the outside?

Q3: Why does DAST run later in the pipeline than SAST, SCA, and secret scanning?

Because DAST tests a running application from the outside, so it needs a deployed target — it cannot run until the app is deployed to a test environment. SAST, SCA, and secret scanning need only the source or dependency list, so they run early and fast under “fail fast, fail cheap.”

Q4: What is a baseline, and why does it matter when adopting scanning on a legacy codebase?

A baseline records the current set of findings as accepted, so the gate fails only on new findings introduced by a change — not on the hundreds of pre-existing ones. Without it, the first run on a legacy app fails on everything and the team switches the gate off within a week.

Q5: What is the right way to handle a false positive, and what should you never do?

Suppress the single finding in config with a recorded reason and author, leaving the rule active. Never disable the whole rule to silence one noisy alert — that blinds you to every real future instance of it, which is the most common way a security gate quietly stops protecting anything.

12 Interview Prep

Real questions asked in NZ QA interviews for senior automation and DevSecOps roles. Read the model answers, then practise your own version.

“We have a yearly pen test. Why would you add security scanning to the pipeline as well?”

Because a yearly pen test cannot keep up with daily deploys — a vulnerable dependency or a committed secret can ship the day after the test and sit in production until the next one. I’d add the checks that run on every change: secret scanning on every commit, SAST and SCA on the pull request and build, and a ZAP baseline DAST scan after deploy to a test environment. I’d gate the build on Critical/High and any secret, warn on Medium, and keep the results as per-build evidence. The pen test still adds value as a deeper, human-led check — the pipeline scanning is the continuous backstop between them.

“A developer says the security gate is too noisy and wants it turned off. What do you do?”

I’d treat noise as a tuning problem, not a reason to switch off protection. First, check we’re gating only on what matters — Critical/High and secrets — and warning rather than failing on Medium. Second, if we’re adopting on a legacy codebase, set a baseline so only new findings fail the build. Third, triage the noisy findings: real ones get fixed or a time-boxed waiver, false ones get suppressed individually with a recorded reason — never by disabling the whole rule. A gate the team trusts and keeps is worth far more than a strict one they bypass.

“Where would you place a DAST scan in the pipeline, and what would you scan against?”

DAST tests a running application from the outside, so it has to run after a deploy — I’d place a ZAP baseline scan post-deploy, against a dedicated test environment with no real customer data, never against production. A baseline scan spiders the app and runs passive checks for things like missing security headers and insecure cookies, fast enough to live in a pipeline. The scans that need only source — secret scanning, SAST — run much earlier, with SCA at the build stage once dependencies resolve. The ordering follows fail-fast: cheap, deployment-free checks first.

← Senior Automation Back to Senior Automation →

Security Automation in CI

1 The Hook

2 The Rule

3 The Analogy

4 The Four Scan Types

SAST — static application security testing

Dependency / SCA scanning

Secret scanning

DAST — dynamic application security testing

5 Where Each Scan Fits in the Pipeline

6 Failing the Build on Findings

7 Triaging False Positives

8 Audit-Ready Evidence

9 Common Mistakes

10 Now You Try

11 Self-Check

Related techniques

12 Interview Prep