GitHub Actions Matrix Builds and Parallel Test Sharding

GitHub Actions Matrix Builds and Parallel Test Sharding

February 25, 2022 · 6 min read · by Muhammad Amal programming

TL;DR — Matrix builds run the same job N times with different parameters. Use them to test multiple Go versions, multiple OSes, or to shard a slow test suite across parallel runners. Three lines of YAML for what would otherwise be a copy-paste mess.

After the caching post, the next CI scaling lever is parallelism. GitHub Actions runs jobs in parallel by default; what most people don’t fully use is the matrix feature that turns one job definition into N parallel runs.

I use matrix builds for three things: testing across multiple Go versions, testing across OSes when relevant, and sharding a big test suite across parallel runners to cut wall-clock time. This post covers each.

Pattern 1 — Multi-version Go testing

If you maintain a library that needs to support multiple Go versions, matrix is the right tool:

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        go-version: ['1.17', '1.18']
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: ${{ matrix.go-version }}
          cache: true
      - run: go test -race -count=1 ./...

This runs the same job twice in parallel, once per Go version. The Go 1.18 entry will pull the latest 1.18.x (1.18 is the next major Go release expected mid-March 2022; for now treat it as RC). Useful for catching incompatibilities early.

For application code (not a library) targeting a single Go version, skip this — you don’t need to test against versions you don’t deploy.

Pattern 2 — Multi-OS testing

If your code runs on more than one OS (CLI tools, anything with filesystem assumptions), matrix the OS:

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-22.04, macos-12, windows-2022]
        go-version: ['1.17']
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: ${{ matrix.go-version }}
      - run: go test -race -count=1 ./...

Three runs in parallel, one per OS. Catches OS-specific issues that local dev on a single machine would miss — path separator differences, file mode/permission differences, line-ending issues.

The two matrices multiply if you list both: 2 Go versions × 3 OSes = 6 parallel jobs. Use include or exclude to skip unwanted combinations:

strategy:
  matrix:
    os: [ubuntu-22.04, macos-12, windows-2022]
    go-version: ['1.17', '1.18']
    exclude:
      - os: windows-2022
        go-version: '1.18'   # skip this combo

Pattern 3 — Sharding a slow test suite

When go test ./... takes 12 minutes and you want to get under 3, shard it across N runners:

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: '1.17'
          cache: true
      - run: |
          PACKAGES=$(go list ./... | awk -v shard=${{ matrix.shard }} -v total=4 \
            'NR % total == (shard - 1) % total')
          go test -race -count=1 $PACKAGES

This splits packages across 4 parallel runners by line number — a crude but effective hash. Each runner tests roughly a quarter of packages; wall-clock time drops ~4×.

For a more sophisticated split, tools like tparse or go-test-report can analyze prior runs and partition tests by historical duration so each shard finishes at the same time. Worth the investment when test runtime imbalance becomes the bottleneck.

fail-fast — when to turn it off

Matrix jobs have a fail-fast knob, default true:

strategy:
  fail-fast: false   # default true
  matrix:
    go-version: ['1.17', '1.18']

With fail-fast: true (default), if any matrix entry fails, the others cancel immediately. Saves time when you just need any failure to know the PR is broken.

With fail-fast: false, all entries run to completion regardless. Useful for:

Multi-version testing where you want to see which versions break
Multi-OS testing where you want to see which OS broke
Test sharding where you want a complete failure report rather than partial

Default to true for PR CI (fast feedback). Switch to false for nightly/release builds where complete results matter.

max-parallel — when you need to limit

strategy:
  max-parallel: 2
  matrix:
    shard: [1, 2, 3, 4]

Limits how many matrix jobs run concurrently. Useful when:

Hitting GitHub’s concurrency limits (20 parallel jobs on the free tier)
Tests share an external resource (a database) that doesn’t like 4 simultaneous test suites
You want to spread load on a self-hosted runner pool

max-parallel: 1 is functionally serial. If you need that, you may want to ask whether matrix is the right tool — separate sequential jobs may read better.

Including extras per matrix entry

include adds matrix entries with extra parameters:

strategy:
  matrix:
    os: [ubuntu-22.04, macos-12]
    include:
      - os: ubuntu-22.04
        coverage: true
        upload-artifacts: true
      - os: macos-12
        coverage: false
        upload-artifacts: false

Only the Linux job runs coverage + uploads artifacts. Saves time + storage compared to running them on every OS.

When matrix is not the answer

Matrix is a hammer. Things it isn’t great for:

Truly different jobs. If your “matrix entries” need different steps, separate jobs read better.
Setup-heavy workflows where parallelism doesn’t help. Spinning up 4 runners that each spend 90% of time installing Docker is wasteful; one job with parallel inner workers is better.
Workflows where order matters. Matrix entries run in parallel by default. If you need entry A to finish before B starts, you’ve moved past matrix and into needs-based dependencies.

Real numbers

Our notifications service test suite went from 8 minutes (single job) to ~2.5 minutes (4 shards). The shards aren’t perfectly balanced because we split by line number, but the slowest shard sets the wall-clock and even the slowest is under 3 minutes.

For services where tests run < 3 minutes already, sharding isn’t worth the complexity. Pay the parallelism cost only when serial-time genuinely matters.

Common Pitfalls

Forgetting to scope path filters per matrix entry. If you have OS-specific tests, run them only on matching OS. runs-on: ${{ matrix.os }} + scoped if: clauses keep noise down.

Caching across matrix entries with the same key. Each matrix run is a separate execution; the cache action is fine with this. But if your key doesn’t include the matrix dimension, the first entry’s cache is read by the others — sometimes wrong (e.g., a Linux cache restored on macOS). Include ${{ matrix.os }} in cache keys.

Sharding tests where packages depend on order. Some test suites have hidden ordering dependencies (shared global state). Sharding will randomly fail. Fix the tests or don’t shard.

fail-fast: false on every matrix. Sounds nice; usually wastes minutes. Default true is right for most cases.

Matrix with include that re-specifies all params. include adds entries to the cross-product. To replace an entry, you have to define it identically in matrix and exclude others. Read the docs carefully; intuition fails here.

Treating GitHub’s job concurrency limits as unlimited. Free tier: 20 concurrent jobs across the whole account. A 6-shard test matrix on three branches × two services = past the limit. Pro/Team tiers raise it.

Wrapping Up

Matrix builds turn one job into N. Use them for multi-version, multi-OS, and test sharding when the test suite is too slow for serial. Pay attention to fail-fast and cache keys. Final post of the month (Mon Feb 28): deploying Docker images from GitHub Actions to staging — closing the loop from “code merged” to “running in staging.”

Pattern 1 — Multi-version Go testing

Pattern 2 — Multi-OS testing

Pattern 3 — Sharding a slow test suite

fail-fast — when to turn it off

max-parallel — when you need to limit

Including extras per matrix entry

When matrix is not the answer

Real numbers

Common Pitfalls

Wrapping Up

Related posts

Docker Compose for CI, Ephemeral Stacks per Test Run

Deploying Docker Images from GitHub Actions to Staging

GitHub Actions Caching, actions/cache + BuildKit Registry Cache

GitHub Actions for Go Monorepos, A 2022 Setup

Advanced GitHub Actions, Reusable Workflows, OIDC, and Matrix Patterns That Don't Become Spaghetti

Testing Clean Architecture, Unit, Use Case, Integration

Containerizing a Rust Service, A Sub-25MB Production Image

Secrets Scanning in 2024, TruffleHog and Gitleaks in CI

Let’s Start a Project