background-shape
Progressive Delivery in 2023, Argo Rollouts and Flagger Side by Side
June 16, 2023 · 7 min read · by Muhammad Amal programming

TL;DR — “Deploy to prod” should never mean “send all traffic at v2 instantly.” / Argo Rollouts and Flagger both give you canary + analysis + automated rollback on Kubernetes 1.27. / The hard part isn’t the rollout config; it’s the metrics that decide promotion. Spend the time there.

I have a rule for any team I work with: if a production deploy can take 100% of traffic to a new version inside 30 seconds, you don’t have continuous delivery, you have continuous Russian roulette. Progressive delivery — canaries, weighted traffic, automated analysis — is the difference between “we deploy 20 times a day” and “we deploy 20 times a day and sleep at night.”

Both Argo Rollouts and Flagger have matured a lot. Both work. Both have sharp edges. This post walks through what each looks like in practice and where I’d choose one over the other.

For the GitOps substrate underneath, see FluxCD vs ArgoCD. The patterns here work with either.

The model: what progressive delivery actually does

A progressive rollout has four moving parts:

  1. A traffic-shaping mechanism — service mesh (Istio, Linkerd), ingress (NGINX, Gloo), or native Kubernetes Service weights via a controller.
  2. A controller that gradually shifts weights from stable to canary.
  3. An analysis step that queries some signal (Prometheus, Datadog, Cloudwatch) at each weight increment and decides “pass” or “fail.”
  4. A rollback path that undoes the rollout if analysis fails — without paging a human.

Argo Rollouts and Flagger both implement this loop. They differ in how they integrate with the rest of the world.

Argo Rollouts: rollout-as-resource

Argo Rollouts (1.5 in mid-2023) replaces the standard Deployment resource with a Rollout. You give up Deployment’s simplicity in exchange for declarative rollout semantics.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: checkout
  namespace: checkout
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: checkout-canary
      stableService: checkout-stable
      trafficRouting:
        istio:
          virtualService:
            name: checkout
            routes: [primary]
      steps:
        - setWeight: 5
        - pause: { duration: 2m }
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service
                value: checkout
        - setWeight: 25
        - pause: { duration: 5m }
        - analysis:
            templates: [{ templateName: success-rate }, { templateName: latency-p99 }]
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
  selector:
    matchLabels: { app: checkout }
  template:
    metadata:
      labels: { app: checkout }
    spec:
      containers:
        - name: checkout
          image: registry.acme.io/checkout:1.4.7
          ports: [{ containerPort: 8080 }]
          readinessProbe:
            httpGet: { path: /health, port: 8080 }

The AnalysisTemplate is where the work lives. This is a Prometheus-backed pass/fail gate:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: checkout
spec:
  args:
    - name: service
  metrics:
    - name: success-rate
      interval: 30s
      count: 6
      successCondition: result[0] >= 0.99
      failureLimit: 2
      provider:
        prometheus:
          address: http://prometheus.platform-observability:9090
          query: |
            sum(rate(
              istio_requests_total{
                reporter="destination",
                destination_service=~"{{args.service}}.*",
                response_code!~"5.."
              }[1m]
            ))
            /
            sum(rate(
              istio_requests_total{
                reporter="destination",
                destination_service=~"{{args.service}}.*"
              }[1m]
            ))

A few details that matter:

  • failureLimit: 2 means two consecutive failures cause an abort, not one transient blip.
  • count: 6 with interval: 30s means a 3-minute analysis window. Tune this to your traffic volume. Low-traffic services need longer windows.
  • The query filters out 5xx; if you also care about client errors during canary, separate that into a different AnalysisTemplate with its own threshold.

The Argo Rollouts dashboard and kubectl-argo-rollouts plugin make manual intervention easy. kubectl argo rollouts promote checkout skips to the next step; kubectl argo rollouts abort checkout rolls back.

Flagger: in-place mutation, declarative analysis

Flagger (1.31) takes a different approach. You keep your Deployment, and Flagger creates a shadow deployment plus the canary Service and virtual service for traffic shaping. The Canary resource is the policy:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: checkout
  namespace: checkout
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout
  service:
    port: 8080
    targetPort: 8080
    gateways: [public]
    hosts: [checkout.acme.io]
  analysis:
    interval: 30s
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange: { min: 99 }
        interval: 1m
      - name: request-duration
        thresholdRange: { max: 500 }
        interval: 1m
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.flagger/
        timeout: 5s
        metadata:
          cmd: hey -z 1m -q 10 -c 2 http://checkout-canary:8080/

The shape is different but the moving parts are the same: weight steps, metric thresholds, a rollback if failures accumulate. Flagger ships with built-in metric templates for common cases (request rate, error rate, latency) so a basic canary doesn’t require writing your own Prometheus query.

The webhooks.load-test part is interesting. For low-traffic services, the canary won’t see enough requests to produce meaningful analysis. Flagger has a built-in load tester that you can fire at the canary endpoint during analysis. Argo Rollouts doesn’t have an equivalent out of the box; you’d run k6 or Locust as a separate Job.

When I pick which

  • Already running ArgoCD → Argo Rollouts. Same project, same UI, same mental model. The Rollouts dashboard plugs into the ArgoCD UI.
  • Already running Flux → Flagger. Flagger is from the Flux ecosystem and integrates with the Flux Helm controller smoothly.
  • Multiple traffic shapers (mesh in prod, ingress in staging) → Flagger. Its provider abstraction handles Istio, Linkerd, NGINX, Gloo, Skipper, Traefik, App Mesh, and Contour with the same Canary resource.
  • Want to ship without a mesh → Argo Rollouts. Its “smi” and “header-based” routing options are easier to wire up to plain NGINX.

Neither tool is dramatically better. The one your team already uses adjacent ecosystem pieces for is the right pick.

The metrics are the hard part

I’m going to keep repeating this until people stop ignoring it: the rollout config is 20% of the work. The other 80% is having metrics that meaningfully distinguish “this version is bad” from “Tuesday looks different from Monday.”

Concrete pitfalls I’ve watched happen:

  • A 5xx rate threshold of 99% success looks safe. The service in question normally runs at 99.95%. Dropping to 99.1% — within the threshold — is still a 20x increase in errors. Use rate-of-change, not absolute thresholds.
  • Latency p50 looks healthy during canary. p99 is on fire. Always check the long tail.
  • The canary only sees 5% of traffic, which is two requests per minute. The statistical noise is enormous. Either lengthen the analysis window or use a shadow deploy that gets mirrored traffic.
  • The analysis runs against the destination service total, not the canary specifically. Make sure the Prometheus query filters by canary version label or destination subset.

The Argo Rollouts docs cover the analysis primitives well; the Istio docs at istio.io cover the traffic-shaping side.

Header-based and percent-based canaries together

For high-stakes services I run a two-stage canary: first a header-based rollout for internal QA, then a percentage rollout to real users. Argo Rollouts handles this cleanly:

strategy:
  canary:
    steps:
      - setHeaderRoute:
          name: qa-only
          match:
            - headerName: X-Canary-QA
              headerValue:
                exact: "true"
      - pause: {}  # manual gate; QA tests
      - setHeaderRoute:
          name: qa-only  # remove header route
      - setWeight: 5
      - analysis: { templates: [{ templateName: success-rate }] }
      - setWeight: 25
      - ...

QA hits the service with X-Canary-QA: true, hits only the canary version, validates the change with realistic scenarios, then a human resumes the rollout into the percent-weighted phase. The percent-weighted phase is fully automated.

Common Pitfalls

  • No baseline metrics before promotion. “We’ve been at 99.95% for weeks” needs to be a graph someone has seen. Don’t discover during a rollback that you have no SLI.
  • Forgetting database migrations. Progressive delivery assumes both versions can run simultaneously. A breaking schema change makes that impossible. Expand-contract migrations are required.
  • Aborts that don’t actually abort. Test the rollback path quarterly. Deploy a deliberately-broken version and confirm the system recovers without human help. If you’ve never tested it, it doesn’t work.
  • Analysis templates that page during failure. The whole point of automated analysis is to make a decision without paging. If your AnalysisTemplate routes alerts to PagerDuty and aborts the rollout, you’ll get woken up for events the system handled fine. Page only on abort failures.
  • Mixing Argo Rollouts and Deployments in the same chart. Helm charts that conditionally template a Deployment or Rollout based on a values flag are a maintenance nightmare. Pick one and migrate, don’t conditionally template.
  • Trusting load tests instead of real traffic. Synthetic load against a canary is better than nothing for low-traffic services, but it doesn’t catch user-shaped traffic patterns. Shadow-mirror real traffic when possible.

Wrapping Up

Progressive delivery is a multiplier on your deployment frequency. It doesn’t help if you deploy weekly. It transforms the calculus if you deploy hourly. The investment is in the metrics, not the controller.

Next: how Backstage 1.14 ties this all together as the actual interface stream teams use to ship.