Deploying Docker Images from GitHub Actions to Staging
TL;DR — On merge to main: build, push to GHCR with both SHA + branch tags, then deploy via kubectl with the SHA tag. Use OIDC for cloud auth instead of long-lived secrets. Environment-protected jobs for prod. Wired right, “merged” to “running in staging” takes 4 minutes.
Last post of February. Tying together everything from this month: containerization from January, GitHub Actions CI, registry caching, matrix parallelism. The thing that turns it from “CI” into “CI/CD” is the last D — Deploy.
This is the deploy pipeline we’re running for staging in early 2022. Production is similar but with manual approval and extra protections we’ll cover separately. Staging auto-deploys on every push to main.
The full deploy workflow
# .github/workflows/deploy-billing-staging.yml
name: deploy-billing-staging
on:
push:
branches: [main]
paths:
- 'cmd/billing/**'
- 'internal/billing/**'
- 'internal/shared/**'
- 'Dockerfile.billing'
concurrency:
group: deploy-billing-staging
cancel-in-progress: false # don't cancel mid-deploy
jobs:
build:
runs-on: ubuntu-22.04
permissions:
contents: read
packages: write
outputs:
image-sha: ${{ steps.meta.outputs.version }}
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/metadata-action@v4
id: meta
with:
images: ghcr.io/${{ github.repository_owner }}/billing
tags: |
type=sha,format=long
type=raw,value=staging
- uses: docker/build-push-action@v3
with:
context: .
file: Dockerfile.billing
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
VERSION=${{ github.ref_name }}
COMMIT=${{ github.sha }}
platforms: linux/amd64
deploy:
runs-on: ubuntu-22.04
needs: build
environment: staging
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v3
- uses: azure/setup-kubectl@v3
with:
version: v1.23.5
- uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::123456789012:role/gh-actions-staging
aws-region: ap-southeast-1
- run: aws eks update-kubeconfig --name staging-cluster --region ap-southeast-1
- name: Deploy to staging
run: |
kubectl -n billing set image deploy/billing \
billing=ghcr.io/${{ github.repository_owner }}/billing@${{ needs.build.outputs.image-sha }}
kubectl -n billing rollout status deploy/billing --timeout=5m
- name: Smoke test
run: |
ENDPOINT=https://billing-staging.example.com
for i in 1 2 3; do
if curl -fsS --max-time 5 "$ENDPOINT/readyz"; then
echo "Smoke test passed"
exit 0
fi
sleep 5
done
echo "Smoke test failed"
exit 1
Two jobs: build produces the image, deploy applies it. Sequential via needs. About 4 minutes end-to-end on warm cache.
What’s happening, walked through
Path-filtered trigger. Only runs when files under cmd/billing/**, internal/billing/**, internal/shared/**, or Dockerfile.billing change. Push to main that only touches the notifications service doesn’t trigger billing’s deploy.
concurrency without cancel-in-progress. Critical for deploys. If a second deploy queues while the first is mid-rollout, you want it to wait, not cancel the first one. Set cancel-in-progress: false (or omit it — default depends on Actions version).
docker/metadata-action@v4 for tags. Generates the image tags from one definition: type=sha,format=long produces a sha-<long-sha> tag; type=raw,value=staging produces a fixed staging tag. The image gets pushed under both tags simultaneously, which lets you kubectl set image ... @sha256:... for immutable deploys AND have a friendly :staging tag for humans to reference.
OIDC for AWS auth. This is the big 2022 win. aws-actions/configure-aws-credentials@v1 exchanges a GitHub OIDC token for short-lived AWS STS credentials. No AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY secrets. The IAM role you assume is configured to trust GitHub OIDC for your specific repo/branch/environment.
# Terraform sketch of the IAM trust policy
data "aws_iam_policy_document" "trust" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = ["arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"]
}
condition {
test = "StringEquals"
variable = "token.actions.githubusercontent.com:aud"
values = ["sts.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:yourorg/yourrepo:environment:staging"]
}
}
}
Short-lived creds, scoped to a specific environment. No secret to rotate. No “an engineer copied AWS keys to Slack three years ago” risk.
environment: staging. Lets you configure required reviewers (off for staging, on for prod), environment-scoped secrets, deployment URL display in the GitHub UI. Even when you don’t need approvals, it’s worth declaring environments for the visibility and history.
Immutable image tags for kubectl set image. Setting the image by @sha256:... digest (or :sha-<long-sha> tag) means the deploy is reproducible. If you set :staging, you’re at the mercy of whatever happened to be :staging when the rollout was triggered — fine for humans but a footgun in automation.
Rollout status + smoke test. rollout status blocks until the new pods are ready (or fails on timeout). Then a smoke curl confirms the actual endpoint serves traffic. Catches the case where pods are healthy by their probe but the route hasn’t propagated yet.
Variations by deployment target
The above is EKS. The pattern adapts:
Cloud Run:
- uses: google-github-actions/auth@v0
with:
workload_identity_provider: projects/123/locations/global/workloadIdentityPools/gh/providers/gh-provider
service_account: gh-deploy@project.iam.gserviceaccount.com
- uses: google-github-actions/deploy-cloudrun@v0
with:
service: billing-staging
image: ghcr.io/${{ github.repository_owner }}/billing@${{ needs.build.outputs.image-sha }}
region: asia-southeast2
Same OIDC-based auth, same immutable image, different deploy command.
Plain SSH (e.g., a VPS running Docker Compose):
- uses: appleboy/ssh-action@v0.1.6
with:
host: ${{ secrets.STAGING_HOST }}
username: deploy
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
cd /opt/stack
docker pull ghcr.io/${{ github.repository_owner }}/billing@${{ needs.build.outputs.image-sha }}
sed -i "s|image:.*billing.*|image: ghcr.io/${{ github.repository_owner }}/billing@${{ needs.build.outputs.image-sha }}|" docker-compose.yml
docker compose up -d billing
Lower-tech but works. SSH key as secret is still required (OIDC doesn’t help here). Rotate annually.
Production = staging + protections
Production differs from staging by:
on: workflow_dispatchonly (no auto-deploy on push)environment: productionwith required-reviewer protection- Possibly a “promote from staging” pattern that retags the staging image as production rather than rebuilding
- Stricter rollback steps in the workflow
deploy-production:
runs-on: ubuntu-22.04
environment: production # requires approval
steps:
- name: Promote staging image to production
run: |
STAGING_DIGEST=$(docker buildx imagetools inspect \
ghcr.io/${{ github.repository_owner }}/billing:staging \
--raw | jq -r '.config.digest')
docker buildx imagetools create \
--tag ghcr.io/${{ github.repository_owner }}/billing:production \
ghcr.io/${{ github.repository_owner }}/billing@${STAGING_DIGEST}
No rebuild — the exact image bytes that passed staging are what runs in prod. Higher confidence, faster deploy.
Common Pitfalls
Deploying with the latest tag. Same problem as the staging tag in automation: nondeterministic. Always deploy by digest or by SHA-tag.
Long-lived cloud secrets in Actions secrets. AWS access keys, GCP service account JSON, etc. — these used to be the standard pattern. They’ve aged into a security smell. OIDC removes the need.
cancel-in-progress: true on deploy. Cancels mid-rollout. Half-deployed services are unpleasant. Default false for deploys.
No smoke test after rollout. “Kubernetes says the pods are ready” doesn’t always equal “the app actually works.” A 3-second curl after rollout has saved me from many “deploy succeeded but service down” pages.
Branch protections that block deploys you actually want. environment: production with required reviewer is good. Adding “require status checks” that depend on CI from a different branch is a recipe for “deploy stuck waiting for a check that will never run.”
Letting deploy logs leak secrets. Some debug steps print env vars. Use add-mask:: workflow commands for any value you log, or stop logging them.
Wrapping Up
This deploy workflow ties together the month: a Docker image built with the right Dockerfile, pushed via registry-cached BuildKit, and rolled out via kubectl with OIDC-authed cloud access. End-to-end: code merged to main → running in staging in about 4 minutes. End-of-month retro (no separate post — this is it): February covered the Postgres half and the CI/CD half. March pivots to Rust and memory safety for backend APIs. Same pattern: theme per month, three articles per week, on the same blog you’re reading this on.