Advanced GitHub Actions, Reusable Workflows, OIDC, and Matrix Patterns That Don't Become Spaghetti
TL;DR — Stop copy-pasting
.github/workflows/ci.ymlacross repos. Use reusable workflows for org-wide CI logic. / OIDC trust to AWS/GCP/Azure eliminates long-livedAWS_ACCESS_KEY_IDsecrets — set it up once, never rotate again. / Composite actions handle the small-but-shared steps; matrix strategies needfail-fast: falseandmax-parallelto be usable at scale.
GitHub Actions stopped being “Jenkins but easier” around the time reusable workflows landed. The platform now has enough primitives to build a real CI/CD layer for an org, but most teams I see are still treating each repo’s ci.yml like an island. Below is the pattern I push every team toward: a single platform repo that holds the shared CI logic, OIDC-based cloud auth, and a small set of well-versioned composite actions.
This is the seventh post in the platform engineering series. For the GitOps side that consumes these CI artifacts, see ArgoCD ApplicationSets at scale. For progressive delivery downstream, see Argo Rollouts and Flagger.
Reusable workflows: one definition, many callers
A reusable workflow lives in a repo and is called by other workflows via uses: org/repo/.github/workflows/file.yml@ref. The caller workflow looks small; the called workflow does the work. This is exactly what you want for “every Go service should run the same lint, test, build, scan, push pipeline.”
The platform repo’s reusable workflow:
# acme/platform-ci/.github/workflows/go-service-ci.yml
name: Go Service CI
on:
workflow_call:
inputs:
service-name:
required: true
type: string
go-version:
required: false
type: string
default: '1.20'
push-image:
required: false
type: boolean
default: false
secrets:
registry-token:
required: false
permissions:
contents: read
id-token: write
packages: write
jobs:
test:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v4
with:
go-version: ${{ inputs.go-version }}
cache: true
- run: go test ./... -race -coverprofile=cover.out
- uses: actions/upload-artifact@v3
with:
name: coverage
path: cover.out
lint:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: golangci/golangci-lint-action@v3
with:
version: v1.53
build:
needs: [test, lint]
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v2
- uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-build
aws-region: us-east-1
- uses: aws-actions/amazon-ecr-login@v1
id: ecr
- uses: docker/build-push-action@v4
with:
context: .
push: ${{ inputs.push-image }}
tags: |
${{ steps.ecr.outputs.registry }}/${{ inputs.service-name }}:${{ github.sha }}
${{ steps.ecr.outputs.registry }}/${{ inputs.service-name }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
scan:
needs: build
runs-on: ubuntu-22.04
steps:
- uses: aquasecurity/trivy-action@0.11.2
with:
image-ref: 123456789012.dkr.ecr.us-east-1.amazonaws.com/${{ inputs.service-name }}:${{ github.sha }}
severity: 'CRITICAL,HIGH'
exit-code: '1'
And the per-service caller workflow shrinks to:
# acme/checkout/.github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
ci:
uses: acme/platform-ci/.github/workflows/go-service-ci.yml@v1
with:
service-name: checkout
go-version: '1.20'
push-image: ${{ github.event_name == 'push' }}
The win: when the platform team updates the Trivy version, fixes a flaky test runner, adds an SBOM step, every service gets it on the next CI run. No 80-PR campaign across stream-team repos.
Pin reusable workflows to versions, not main
@v1 above is a release tag. Use git tags (or even better, immutable SHAs) for reusable workflow refs. @main means every push to the platform repo retests every consumer. That includes the in-progress fix that breaks everything.
Cut a release of platform-ci like any other product. Semver. Changelog. Major version bumps when you change the input contract. The consumers can pin to @v1 for the major and get patch updates automatically, or pin to a SHA for full immutability.
OIDC: kill the long-lived AWS keys
Until OIDC trust landed, the standard pattern was an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY stored as repo secrets. Every CI run had cloud credentials in environment variables, the keys had to be rotated, leaked credentials in build logs were a recurring problem. OIDC makes this go away entirely.
GitHub Actions issues a JWT per workflow run, signed by GitHub. AWS IAM trusts GitHub’s OIDC provider and exchanges the JWT for a short-lived STS session. No long-lived keys. The trust policy on the IAM role looks like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:acme/*:ref:refs/heads/main"
}
}
}
]
}
The sub condition is the security boundary. The pattern above lets any workflow on main of any repo in the acme org assume the role. For sensitive roles (prod deploy), narrow it: repo:acme/checkout:environment:prod requires the workflow run to be against the prod GitHub Environment, which has its own approval gates.
The GitHub docs at docs.github.com document the sub claim format exhaustively. Read it carefully — getting the trust policy wrong gives any repo in the org access to prod.
GCP and Azure have equivalent OIDC integrations. Use them. Long-lived service account keys belong in 2018.
Composite actions for the small shared pieces
Reusable workflows are heavy. For small composable pieces — “set up the cache, fetch a vault secret, post a Slack notification” — composite actions are the right grain.
# acme/platform-actions/notify-slack/action.yml
name: Notify Slack
inputs:
webhook-url:
required: true
status:
required: true
service:
required: true
runs:
using: composite
steps:
- shell: bash
run: |
emoji=":white_check_mark:"
if [[ "${{ inputs.status }}" != "success" ]]; then emoji=":x:"; fi
payload=$(jq -nc \
--arg svc "${{ inputs.service }}" \
--arg sha "${GITHUB_SHA::7}" \
--arg actor "$GITHUB_ACTOR" \
--arg emoji "$emoji" \
--arg url "$GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID" \
'{text: ("\($emoji) *\($svc)* deploy by \($actor) at \($sha) — <\($url)|details>")}')
curl -fsS -X POST -H 'Content-Type: application/json' \
-d "$payload" "${{ inputs.webhook-url }}"
Call it from any workflow: uses: acme/platform-actions/notify-slack@v2. Composite actions are not isolated runners — they execute in the caller’s job, sharing the workspace. That makes them fast and easy to chain.
Matrix patterns that don’t get out of hand
Matrix builds turn 1-job workflows into 30-job workflows in one line. Useful, but fail-fast and max-parallel deserve more attention than they get.
strategy:
fail-fast: false
max-parallel: 4
matrix:
go-version: ['1.19', '1.20']
os: [ubuntu-22.04, ubuntu-20.04]
include:
- go-version: '1.20'
os: ubuntu-22.04
coverage: true
exclude:
- go-version: '1.19'
os: ubuntu-20.04
fail-fast: false— without this, one failing combination cancels the others. For test matrices you usually want to see all failures, not just the first.max-parallel: 4— caps concurrent runners. Set this when you have many matrix dimensions or run on self-hosted runners with limited capacity. Without it, a 30-cell matrix consumes 30 runners simultaneously.includeadds extra combinations with custom values.excludedrops combinations from the cartesian product. Use them when the matrix doesn’t quite match what you want.
For dynamic matrices (e.g., “test against every service that changed in this PR”), generate the matrix in a setup job and pass it via needs.setup.outputs.matrix:
jobs:
setup:
runs-on: ubuntu-22.04
outputs:
services: ${{ steps.changed.outputs.services }}
steps:
- uses: actions/checkout@v4
- id: changed
run: |
services=$(git diff --name-only origin/main...HEAD \
| grep -oP '^services/\K[^/]+' | sort -u | jq -R . | jq -sc .)
echo "services=$services" >> $GITHUB_OUTPUT
test:
needs: setup
if: needs.setup.outputs.services != '[]'
strategy:
matrix:
service: ${{ fromJson(needs.setup.outputs.services) }}
runs-on: ubuntu-22.04
steps:
- run: echo "Testing ${{ matrix.service }}"
This pattern shrinks PR CI time enormously in monorepos. Only the changed services get tested.
Environments and approvals: where deployment lives
For prod deploys, GitHub Environments are the gate. An environment can require manual approval, restrict to specific branches, and hold its own secrets. Plus it integrates with the OIDC sub claim above.
deploy-prod:
needs: build
runs-on: ubuntu-22.04
environment:
name: prod
url: https://checkout.acme.io
steps:
- uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy-prod
aws-region: us-east-1
- run: |
kubectl set image deployment/checkout \
checkout=${ECR}/checkout:${{ github.sha }} -n checkout
Configure the prod environment with required reviewers (a Group), a deployment branch restriction (main only), and a wait timer if you want a “30 seconds to abort” window. The approval is logged on the workflow run page — auditable, not Slack-message-only.
In a GitOps world this job usually doesn’t kubectl directly; it bumps an image tag in the deploy repo and lets ArgoCD pick it up. Either way, the environment gate is the right place to enforce the approval.
Common Pitfalls
- Reusable workflows that take 12 inputs. That’s a sign the workflow is doing too much. Split it into two reusable workflows that compose, not one monolith.
- No version pinning on third-party actions.
uses: some-action/foo@v1lets the maintainer changev1under you. Pin to a full SHA for any action that touches secrets. The action ecosystem has had supply-chain incidents. - OIDC trust policy too broad.
repo:acme/*lets every repo assume every role. Scope by repo and by environment for prod-touching roles. - Workflow concurrency unbounded. Add
concurrency: { group: ${{ github.workflow }}-${{ github.ref }}, cancel-in-progress: true }on PR workflows. Without it, every push to a PR queues another run while the old one finishes. - Caching without keys that invalidate.
actions/cacheis great until you have a stale cache from six months ago. Include lockfile hashes and tool versions in the cache key. - Self-hosted runners on the default network. A self-hosted runner that can reach internal infrastructure plus is shared across public repos is a lateral movement risk. Use ephemeral runners (actions/runner-set or arc-runner-set on Kubernetes) per job.
- Secrets in workflow logs. GitHub redacts known secret values, but not derived values.
echo "DB_URL=postgres://user:$DB_PASSWORD@host/db"leaks the password even if$DB_PASSWORDis a secret because the assembled string isn’t. Be careful with composite values.
Wrapping Up
GitHub Actions in 2023 is a real CI/CD platform if you treat it like one. Centralize logic in reusable workflows, kill long-lived cloud credentials with OIDC, gate prod with Environments. The cost is a small platform repo and the discipline to version it.
Final post in this series: cluster cost engineering with Karpenter and KEDA. Because all of this CI/CD machinery scheduling pods has a bill at the end of the month.