Deployment & Staging · CTAL-TA

Canary & Progressive Deployment Testing

Instead of deploying to 100% of users at once, roll out changes to a small percentage (5-10%), monitor for issues, and gradually increase traffic. If error rates spike, automatically rollback. This is risk-aware testing: you’re testing the production impact before exposing everyone.

Senior Test Lead ISTQB CTAL-TA

What it is

Canary deployment is a release strategy where you send new code to a small fraction of production traffic first. You watch metrics: error rates, latency, business metrics like conversion or payment success. If everything looks good after 10 minutes, you gradually roll out to more users. If errors spike or latency increases, you rollback immediately.

It’s called “canary” because historically, canaries were sent into coal mines to detect poisonous gas — the canary would die if the air was toxic, warning miners to evacuate. A canary deployment serves a similar role: a small cohort of users detects problems before they affect the whole user base.

Canary deployments reduce risk, but they don’t eliminate testing requirements. You still need thorough testing before the canary: acceptance tests, load testing, etc. Canary testing is production monitoring, not a substitute for pre-release testing.

Canary mechanics: traffic splitting and rollback

A typical canary deployment looks like this:

  • T=0 minutes: Deploy new version to production alongside the stable version. Route 5% of traffic to the new version; 95% to the stable version.
  • T=5-10 minutes: Monitor error rates, latency, and business metrics. If all look good, increase to 10%.
  • T=20 minutes: All metrics still green. Increase to 25%.
  • T=30 minutes: All metrics still green. Increase to 50%.
  • T=45 minutes: All metrics still green. 100% rollout complete.

If at any point error rates spike or latency increases beyond a threshold, the system automatically routes traffic back to the stable version. The canary rollback is automated and instantaneous.

Testing before canary: the prerequisites

Full acceptance testing

Before the canary even starts, the new version must pass all acceptance tests in a staging environment. Every user journey that will be affected by the change must work. This is standard pre-release testing.

Load testing with the new version

Test the new version under expected production load. A change that works fine in a quiet staging environment might degrade under production scale. Load test before canary to catch performance regressions early.

Baseline metrics

Document the current error rate, latency, and business metrics in production. This is your baseline. During canary, you’ll compare the canary metrics against this baseline to detect regressions.

Example baseline: Current error rate is 0.05%, P99 latency is 200ms, conversion rate is 3.2%. Canary thresholds: if error rate goes above 0.15% (3x baseline) or P99 latency exceeds 300ms (50% increase), trigger rollback.

Rollback plan

Before deploying, document the rollback procedure. How long does rollback take? Can it be automated? Is there a manual approval step? Do you need to coordinate with the database (if there’s a schema migration, is it backwards-compatible)?

Testing during canary: monitoring and metrics

Once the canary is live, testing becomes continuous monitoring. You’re watching for:

Technical metrics

  • Error rate: HTTP 5xx responses, unhandled exceptions, timeouts. Must stay within baseline + threshold (e.g. baseline 0.05%, threshold +0.10%).
  • Latency: Response time for key endpoints. P50, P95, P99 must not degrade significantly.
  • Throughput: Requests per second. Should remain stable. A drop might indicate a bottleneck in the new code.
  • Resource usage: CPU, memory, database connections. A spike might indicate a memory leak or inefficient query in the new code.

Business metrics

  • Conversion rate: Percentage of users completing a purchase or signup. A drop might indicate the new feature broke the checkout flow.
  • User engagement: Page views, session duration, repeat visits. A sudden drop might mean the new UI is confusing.
  • Revenue: Total payment volume. A decline suggests the new code broke something critical.

User-reported issues

Monitor support tickets and error reports during canary. If users are reporting bugs, don’t wait for automated metrics to detect it — rollback immediately.

Canary metrics and success criteria

Canary testing: metric thresholds and rollback triggers
MetricBaseline (stable)Canary thresholdAction
Error rate 0.05% (5 errors per 10k requests) 0.15% (3x baseline) Rollback if error rate exceeds 0.15%
P99 latency 200ms 300ms (50% increase) Rollback if P99 > 300ms for 2+ minutes
CPU utilisation 45% average 70% Investigate if CPU spikes; rollback if it persists
Database connections 50 active connections 100 active connections Rollback if connection pool exhausted
Conversion rate 3.2% 2.8% (12% drop) Rollback if conversion drops below 2.8%

Worked example: payment system canary

Your company updated the payment processing backend to reduce latency. Before canary, you load-tested it in staging: 1000 concurrent users, payment success rate 99.9%, average latency 150ms (down from 200ms currently). Good news. You baseline production metrics:

  • Current error rate: 0.03%
  • Current P99 latency: 250ms
  • Current conversion rate: 4.1%

Canary thresholds:

  • Error rate: rollback if > 0.10% (3x baseline)
  • P99 latency: rollback if > 350ms
  • Conversion rate: rollback if < 3.7% (drop > 10%)

At 10:00 AM, you deploy the new version. 5% of payment requests route to the new code. You watch for 10 minutes:

  • Error rate: 0.035% (slight increase, but within threshold)
  • P99 latency: 180ms (improvement!)
  • Conversion rate: 4.05% (stable)

Metrics look good. Increase to 10% traffic at 10:10 AM. Monitor for 10 minutes:

  • Error rate: 0.038%
  • P99 latency: 175ms
  • Conversion rate: 4.08%

Still good. Continue increasing: 25% at 10:20, 50% at 10:30, 100% at 10:40. By 11:00 AM, the entire production cluster is running the new code. Rollback never triggered because the new code was solid.

Tools and platforms

  • Spinnaker (open source) — CD platform with native canary deployment support; integrates with AWS, Google Cloud, Azure
  • GitLab (built-in feature) — GitLab CI/CD supports canary deployments; configure in .gitlab-ci.yml
  • Flagger (open source, Kubernetes) — automated canary and blue-green deployments on Kubernetes
  • Datadog / New Relic / Prometheus — monitoring and alerting; define rollback triggers based on metrics
  • LaunchDarkly / Unleash — feature flags enable canary deployments at the application layer (in addition to infrastructure-level canaries)

Rollback testing

Rollback must be tested before you need it in an emergency. Here’s how:

Test rollback in staging

Deploy the new version to a staging canary (5% of staging traffic). Then trigger a manual rollback. Verify that traffic instantly routes back to the stable version. Measure rollback time.

Test rollback during a scheduled maintenance window

Deploy a dummy “new version” (no actual code changes) to production canary. Trigger rollback. Measure the time and verify all metrics return to baseline.

Define rollback time SLA

Your rollback must complete within a few seconds. If rollback takes 5 minutes, users will see errors for 5 minutes. Aim for sub-second rollback.

Tips

Feature flags and canary deployments complement each other. A canary deployment gets the code to production. A feature flag controls whether the new code actually runs. You can deploy the new code with the feature off (0% traffic), verify stability, then gradually increase the flag percentage. This gives you two layers of control.

  • Define rollback triggers before deploying. Don’t decide on the threshold for error rate while the canary is live and error rates are rising. Set clear thresholds upfront (e.g. "rollback if error rate > 0.10%").
  • Monitor for at least 5-10 minutes at each canary step. Real-world issues often take time to manifest. A database query that only breaks on a customer’s multi-year data might take 5 minutes to trigger.
  • Alert on business metrics, not just technical ones. An error rate that looks okay technically might still tank conversion. Monitor both.
  • Test schema migrations carefully. If the new version requires a database schema change, ensure the change is backwards-compatible: old code and new code must both work during the migration. This is why canary deployments are challenging for schema changes.
  • Communicate canary status to the team. Canary in progress? Have a clear communication channel (Slack alert, dashboard) so the team is aware and can respond if issues arise.

Related: See Performance Testing for load testing before canary, and Feature Flags for application-level traffic control during deployment.