Test Lead · Release Strategy & Deployment

Release Management & Deployment Testing

Q: Q1. What is the difference between canary and blue-green deployment?

Canary: Deploy to 5% of prod, monitor, gradually increase to 100%. Metrics guide the decision to proceed. Rollback is: disable canary. Blue-Green: Two full prod environments. Deploy to green, run full tests, switch all traffic instantly. Rollback is: switch back to blue. Canary is lower risk (catches bugs earlier). Blue-green is faster (instant cutover) but requires two full environments. Choose canary for microservices; blue-green for monoliths.

The big bang deployment is dead. Modern teams de-risk releases by testing in production, gradually, with the ability to roll back in seconds.

Test Lead ISTQB CTAL-TM — Deployment & Release ~15 min read + checklist

1 The Hook — Why This Matters

A fintech startup deployed a major backend refactor at 2 AM on Friday. They tested it exhaustively in staging. At 2:47 AM, a bug in a migration script caused all customer account balances to read zero—not be zero, just display incorrectly. Customers panicked. Their app crashed from the spike in support calls. By the time they rolled back at 3:15 AM, they had lost $100k in trust and faced a regulatory inquiry.

The problem was not the code. The problem was the release strategy: a big bang deployment with no way to gracefully roll back in seconds, and no gradual traffic shift that would have caught the bug before it affected all users. They released like it was 2010. Modern systems release differently.

Your job is to design deployments that allow safe, fast iteration in production.

Senior engineer insight

What changed how I think about release management was watching a senior engineer refuse to sign off on a go-live because the rollback had never been timed in staging. The code was perfect — every test green, stakeholders ready, PO breathing down her neck. She made the team do a dry-run rollback at 4pm on a Tuesday. It took 12 minutes. She said "we deploy when it takes under 3 minutes" and walked out. Two weeks later the rollback was sub-90 seconds and that discipline saved us when a bad migration hit production 6 months on.

The most common mistake: treating the go-live readiness checklist as a formality to tick rather than a genuine gate — Test Leads sign it off without verifying that the rollback command actually works in the current environment.

2 The Rule — The One-Sentence Version

Test deployments, not just code. De-risk production by releasing gradually with instant rollback.

From the field

A Wellington-based insurance platform was moving to blue-green deployments after years of Friday-night big-bang releases. The team assumed blue-green meant zero downtime was automatic — deploy to green, flip the load balancer, done. What they discovered was that their NZ regulatory reporting job ran every night at midnight and had a hard-coded database connection string pointing to the blue environment. When they flipped to green, the reporting job silently failed for three nights before anyone noticed. The Commerce Commission inquiry was not fun. The lesson: blue-green is a traffic strategy, not a config strategy — every background job, cron, and external integration must be environment-aware, and your go-live readiness check must explicitly verify them all before the cutover call.

Staging tests tell you whether the code works. Deployment tests tell you whether the code works in production at scale. The difference is traffic, state, and real data. Your deployment strategy must answer: How do we get from staging to 100% of production users in a way that catches bugs before they affect everyone?

3 The Analogy — Think Of It Like...

Analogy

Merging traffic lanes.

A highway is being refurbished. Instead of closing all lanes (big bang), they open the new lane to 5% of traffic first. They watch: Are merge accidents happening? Are there bottlenecks? If problems emerge, they close the new lane and reroute. If metrics look good at 5%, they expand to 20%. Then 50%. Then 100%. At no point is the entire highway blocked. De-risked releases follow the same pattern: canary at 5%, monitor closely, gradual expansion, instant rollback if metrics diverge.

4 Watch Me Do It — Deployment Strategies & Testing

Deployment Strategies and Their Testing Implications

Big Bang (All-at-once): Deploy to 100% of production in a single operation. Testing is: "Does it work in staging?" Rollback is: "Revert the deployment and redeploy the previous version." Risk is HIGH. Use only when: No other option (legacy infrastructure, database-only changes) or the change is so isolated that canary isn't needed (minor UI tweak, doc update). Avoid for backend logic, data migrations, or service changes.

Blue-Green: Two identical production environments. Deploy to the "green" environment while "blue" is live. When green is validated, switch traffic instantly. Rollback is: Switch traffic back to blue. Testing is: Full regression suite against green before cutover. Risk is MEDIUM. Works well for monolithic architectures or when you need zero downtime. Gotcha: Database migrations must be backwards-compatible (code in blue must run against the new schema).

Canary Release: Deploy to 5% of production, monitor metrics, gradually increase (5% → 10% → 25% → 50% → 100%). Testing is: Pre-deployment smoke tests in staging, then automated metrics monitoring during canary. Rollback is: Instant (kill the canary, 100% reverts to old version). Risk is LOW. Works for any architecture. Best practice for high-risk changes or microservices. Requires good observability (dashboards, alerts).

Feature Flags: Deploy code to 100% of production, but the feature is disabled for all users. Gradually enable for internal testers (0.1%), beta users (1%), then roll out by geography or user cohort. Testing is: Unit + integration tests in CI/CD, then staged enablement with close monitoring. Rollback is: Flip the flag. Risk is VERY LOW. Allows you to ship code on your schedule and enable on the business's schedule. Requires discipline: flags must be cleaned up once fully rolled out.

Rolling Deployment: Gradually replace old instances with new ones (5 old instances → 4 old + 1 new → 3 old + 2 new → fully new). Testing is: Health checks on each new instance before it receives traffic. Rollback is: Kill new instances, scale up old ones. Risk is MEDIUM. Requires good load balancer configuration and health check design. Common in Kubernetes environments.

Pre-deployment testing In staging, run: Full regression suite, smoke tests, performance baseline (compare response time before/after), security scanning (SAST, dependency checks). Acceptance criteria: No critical/high bugs, performance delta < 5%, no new vulnerabilities. Document the test results and baseline metrics.
Canary setup and monitoring Deploy to 5% of production. Watch for: Error rate (vs baseline), response time (p50, p95, p99), custom business metrics (e.g., checkout success rate). Set alert thresholds: If error rate goes from 0.1% to 0.5%, alert and prepare to rollback. Duration: minimum 15 minutes at each traffic level, or until you've seen enough requests to make a statistical call.
Gradual traffic shift 5% → wait 15 min → check metrics → 25% → wait 15 min → 50% → wait 15 min → 100%. At each step, ask: "Are metrics healthy?" If yes, proceed. If no, rollback. Do not skip steps to "speed things up." Faster == riskier.
Rollback testing and procedures Before release day, test the rollback. Actually do it. Deploy the old version, verify it works, then deploy the new version again. Document the rollback command and who can execute it. In a real incident, you need < 2 minutes to rollback. If it takes > 5 minutes, your rollback is too slow.
Post-deployment validation Run smoke tests against production at each canary stage. Verify: API endpoints respond correctly, critical user flows work (sign up, login, purchase), data integrity checks pass (no orphaned records, no corrupted data). Automate all of this. Manual checks are too slow.
Communication and incident response Before release, brief the team: What are we releasing? What is the rollback trigger? Who is on call? Have a Slack/Teams channel dedicated to the release. Post updates every 15 minutes (even if it is "all quiet"). If a rollback is needed, post incident notification, execute rollback, then investigate the root cause. Use blameless postmortem to capture learnings.

Canary Deployment Monitoring Dashboard (What to Watch)

Metric	Baseline (Old Version)	Canary Threshold (Alert)	Decision
Error Rate (5xx + 4xx)	0.1%	> 0.5% triggers alert	Roll back
Response Time (p95)	150ms	> 250ms triggers alert	Roll back
Response Time (p99)	500ms	> 750ms triggers alert	Roll back
Checkout Success Rate	98%	< 96% triggers alert	Roll back
Database Connection Pool	50/100 used	> 80 used triggers alert	Roll back

Pro tip for NZ teams: Deploy during NZ business hours (9 AM - 3 PM NZDT), not at midnight or on Friday. Have support staff online. If something goes wrong, your team is awake. If you must deploy off-hours, ensure an on-call senior engineer is available to rollback in < 2 minutes. Use chaos engineering (intentional failures) to validate your rollback procedures every quarter.

5 When to Use It / Scope & Limits

✅ Use canary/progressive deployment when...

Deploying to production (any change to backend, database, or core logic)
The change touches customer-facing features or revenue-critical paths
You have good observability (metrics, dashboards, alerts)
Rollback is possible in < 5 minutes
You can monitor real traffic and user behavior

❌ Canary may not be necessary when...

The change is documentation-only (no code changes)
The change is UI-only with no backend logic change
The change is a dependency security patch with zero functional change
You have no monitoring in place and can't observe production behavior
Rollback is impossible (e.g., database migration that can't be reversed)

Before planning a deployment, ask:

Does this change affect production data or user-facing functionality?
Can we monitor success/failure in real time (dashboards, alerts)?
Can we rollback in < 5 minutes if something goes wrong?
Do we have the infrastructure to support canary/blue-green (load balancer, traffic routing)?
Is there a database migration? If yes, must be backwards-compatible.

6 Common Mistakes — Don't Do This

🚫 Skipping canary in favor of "faster" big bang deployment

I used to think: Canary adds 2 hours to the release. We'll just deploy to everyone and monitor closely.
Actually: That 2 hours of canary prevents the 10-hour incident and rollback. A bug that affects 1% of users is a learning. A bug that affects 100% is a disaster. Canary forces you to find bugs early. Build the infrastructure for canary once; use it every time.

🚫 Not testing the rollback procedure before release day

I used to think: Rollback is just "revert the commit and redeploy." It will work when we need it.
Actually: Rollback has failure modes: database state is incompatible, old code crashes on new schema, cache is in an inconsistent state. Test rollback monthly. Document the exact command. Time it. If rollback takes > 5 minutes, you need to improve your process.

🚫 Not monitoring the right metrics during canary

I used to think: If CPU and memory are normal, the deployment is fine.
Actually: Watch business metrics: error rate, response time, and revenue-critical flows (checkout success, login, payment processing). Infrastructure metrics are lagging indicators. Business metrics are leading indicators. A spike in 500 errors tells you to rollback immediately; low CPU does not.

When deployment testing fails

Deployment testing fails when you skip canary and go straight to big bang, only to discover a bug affects all users. It also fails when metrics are not monitored during the canary; you can't detect issues early. Finally, failure occurs when rollback is not tested in advance and takes > 10 minutes to execute; by then, customer trust is gone.

Why teams fail here

Canary thresholds are set but nobody is watching — alerts fire into a Slack channel that has been muted, so the gradual rollout completes to 100% while an elevated error rate goes undetected for hours
The rollback procedure exists in a Confluence page nobody has read since it was written — when an incident hits at 11pm, the on-call engineer spends 8 minutes finding the doc and discovers the kubectl context name changed two months ago
Database migrations are not backwards-compatible — the team deploys the new schema, something goes wrong, and rolling back the code is now impossible because the old code cannot read the new column names
Go-live authority is assumed rather than explicit — in NZ regulated industries (finance, health, government), no single engineer has the authority to approve production deployments alone, and teams skip the formal sign-off in the pressure of a release window, creating audit exposure months later

Key takeaway

A deployment strategy is only as good as the rollback you rehearsed last Tuesday — everything else is hope dressed up as process.

7 Self-Check — Can You Actually Do This?

Click each question to reveal the answer.

Q1. What is the difference between canary and blue-green deployment?

Canary: Deploy to 5% of prod, monitor, gradually increase to 100%. Metrics guide the decision to proceed. Rollback is: disable canary. Blue-Green: Two full prod environments. Deploy to green, run full tests, switch all traffic instantly. Rollback is: switch back to blue. Canary is lower risk (catches bugs earlier). Blue-green is faster (instant cutover) but requires two full environments. Choose canary for microservices; blue-green for monoliths.

Q2. What should you monitor during a canary deployment?

Monitor: (1) Error rate (HTTP 5xx, API errors), (2) Response time (p50, p95, p99), (3) Business metrics (checkout success, login conversion, active users), (4) Resource metrics (CPU, memory, database connections). Set alert thresholds for each. If any alert fires, investigate and rollback if needed. Duration: Keep canary at each traffic level for at least 15 minutes or until you've seen >1000 requests. More data == more confident decision.

Q3. How do you test a rollback before release day?

Do a full dry-run: Deploy the old version (rolling back in time), verify it works and processes traffic correctly, then deploy the new version again. Time the entire rollback. Document the command (e.g., "kubectl set image deployment/api api=api:v1.9" or "terraform apply -var=version=1.9"). Run a monthly "Rollback Drill" where the on-call engineer executes a fake rollback to stay sharp. If rollback takes > 5 minutes, it is too slow.

8 Interview Prep — What They'll Ask

Real Test Lead interview questions on release management.

Q1. Tell me about a release that went wrong and how you handled it.

Good answer: Describe a real incident. Example: "We deployed a feature flag to 100% of users. Within 10 minutes, error rate spiked to 5%. Our monitoring caught it immediately. We rolled back the feature flag in 30 seconds. Root cause: The flag code had a null pointer exception that QA didn't catch because we hadn't tested with the flag enabled. We learned to treat feature flags as code and test flag=on and flag=off. We now test flags in CI/CD before rollout." Show ownership and learnings.

Q2. How do you ensure a canary doesn't cause a cascade failure?

Use circuit breakers and graceful degradation: If the canary version is failing, calls should fall through to the old version, not crash the entire system. Monitor carefully for: dependency failures (database, external APIs), resource exhaustion (connections, memory), cascading timeouts. Set canary traffic low enough that even a 100% failure of canary doesn't overwhelm the old version. If failure is unavoidable, test rollback speed and communicate clearly to customers.

Q3. What would you do if the database migration required during release can't be rolled back?

If the migration is irreversible (e.g., dropping a column), I would: (1) Deploy code that is backwards-compatible with both old and new schema, (2) Run the migration in staging and validate extensively, (3) Run the migration on a backup of production first and validate, (4) Use a feature flag to enable new code only after schema is confirmed, (5) Keep the code backwards-compatible for at least 2 releases so rollback is possible by deploying old code. Never assume you can rollback if data is permanently changed.

Q4. How do you coordinate a deployment across multiple services?

I use a deployment checklist and communication plan: (1) Identify service dependencies (which service must deploy first?), (2) Plan the order and timing, (3) Brief all teams on the release window, (4) Deploy in order: database schema changes first (backwards-compatible), then consumer services, then provider services, (5) Monitor at each step, (6) Use feature flags to decouple deployment from enablement—deploy everything, then enable features in order, (7) Have a rollback plan that can reverse each service independently. Synchronize with Slack/Teams at every step.

Deployment Testing Checklist

Pre-Deployment (Staging)

☐ Full regression suite passes
☐ Smoke tests pass
☐ Performance tests show no degradation (< 5% delta)
☐ Security scanning completes (no new CVEs)
☐ Database migration tested and reversible (if applicable)
☐ Rollback procedure tested in staging environment
☐ Feature flags tested: flag=on and flag=off scenarios
☐ Monitoring dashboards ready (metrics defined, alerts configured)

During Canary Release

☐ Start at 5% traffic, monitor for 15+ minutes
☐ Error rate remains within threshold (< 2x baseline)
☐ Response times within threshold (p95 < 1.5x baseline)
☐ Business metrics healthy (checkout success, login conversion)
☐ No spike in support tickets or customer complaints
☐ Increase to 25%, repeat monitoring
☐ Increase to 50%, repeat monitoring
☐ Increase to 100%, continue monitoring for 1 hour
☐ Post-deployment smoke tests pass

Post-Deployment (Production)

☐ Data integrity checks pass (no orphaned records, no corruption)
☐ Critical user journeys work end-to-end
☐ Cache is warm (no performance cliff from cold cache)
☐ No alerting on infrastructure metrics
☐ Conduct blameless postmortem if any incident occurred
☐ Document what went well and what to improve
☐ Schedule cleanup: remove feature flags, old code, temporary monitoring

← All Test Lead learning Next practice exercise →

Release Management & Deployment Testing

1 The Hook — Why This Matters

2 The Rule — The One-Sentence Version

3 The Analogy — Think Of It Like...

4 Watch Me Do It — Deployment Strategies & Testing

Deployment Strategies and Their Testing Implications

5 When to Use It / Scope & Limits

6 Common Mistakes — Don't Do This

7 Self-Check — Can You Actually Do This?

Related techniques

8 Interview Prep — What They'll Ask

Deployment Testing Checklist

Pre-Deployment (Staging)

During Canary Release

Post-Deployment (Production)