background-shape
GitHub Actions Matrix Builds and Parallel Test Sharding
February 25, 2022 · 6 min read · by Muhammad Amal programming

TL;DR — Matrix builds run the same job N times with different parameters. Use them to test multiple Go versions, multiple OSes, or to shard a slow test suite across parallel runners. Three lines of YAML for what would otherwise be a copy-paste mess.

After the caching post, the next CI scaling lever is parallelism. GitHub Actions runs jobs in parallel by default; what most people don’t fully use is the matrix feature that turns one job definition into N parallel runs.

I use matrix builds for three things: testing across multiple Go versions, testing across OSes when relevant, and sharding a big test suite across parallel runners to cut wall-clock time. This post covers each.

Pattern 1 — Multi-version Go testing

If you maintain a library that needs to support multiple Go versions, matrix is the right tool:

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        go-version: ['1.17', '1.18']
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: ${{ matrix.go-version }}
          cache: true
      - run: go test -race -count=1 ./...

This runs the same job twice in parallel, once per Go version. The Go 1.18 entry will pull the latest 1.18.x (1.18 is the next major Go release expected mid-March 2022; for now treat it as RC). Useful for catching incompatibilities early.

For application code (not a library) targeting a single Go version, skip this — you don’t need to test against versions you don’t deploy.

Pattern 2 — Multi-OS testing

If your code runs on more than one OS (CLI tools, anything with filesystem assumptions), matrix the OS:

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-22.04, macos-12, windows-2022]
        go-version: ['1.17']
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: ${{ matrix.go-version }}
      - run: go test -race -count=1 ./...

Three runs in parallel, one per OS. Catches OS-specific issues that local dev on a single machine would miss — path separator differences, file mode/permission differences, line-ending issues.

The two matrices multiply if you list both: 2 Go versions × 3 OSes = 6 parallel jobs. Use include or exclude to skip unwanted combinations:

strategy:
  matrix:
    os: [ubuntu-22.04, macos-12, windows-2022]
    go-version: ['1.17', '1.18']
    exclude:
      - os: windows-2022
        go-version: '1.18'   # skip this combo

Pattern 3 — Sharding a slow test suite

When go test ./... takes 12 minutes and you want to get under 3, shard it across N runners:

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v3
        with:
          go-version: '1.17'
          cache: true
      - run: |
          PACKAGES=$(go list ./... | awk -v shard=${{ matrix.shard }} -v total=4 \
            'NR % total == (shard - 1) % total')
          go test -race -count=1 $PACKAGES

This splits packages across 4 parallel runners by line number — a crude but effective hash. Each runner tests roughly a quarter of packages; wall-clock time drops ~4×.

For a more sophisticated split, tools like tparse or go-test-report can analyze prior runs and partition tests by historical duration so each shard finishes at the same time. Worth the investment when test runtime imbalance becomes the bottleneck.

fail-fast — when to turn it off

Matrix jobs have a fail-fast knob, default true:

strategy:
  fail-fast: false   # default true
  matrix:
    go-version: ['1.17', '1.18']

With fail-fast: true (default), if any matrix entry fails, the others cancel immediately. Saves time when you just need any failure to know the PR is broken.

With fail-fast: false, all entries run to completion regardless. Useful for:

  • Multi-version testing where you want to see which versions break
  • Multi-OS testing where you want to see which OS broke
  • Test sharding where you want a complete failure report rather than partial

Default to true for PR CI (fast feedback). Switch to false for nightly/release builds where complete results matter.

max-parallel — when you need to limit

strategy:
  max-parallel: 2
  matrix:
    shard: [1, 2, 3, 4]

Limits how many matrix jobs run concurrently. Useful when:

  • Hitting GitHub’s concurrency limits (20 parallel jobs on the free tier)
  • Tests share an external resource (a database) that doesn’t like 4 simultaneous test suites
  • You want to spread load on a self-hosted runner pool

max-parallel: 1 is functionally serial. If you need that, you may want to ask whether matrix is the right tool — separate sequential jobs may read better.

Including extras per matrix entry

include adds matrix entries with extra parameters:

strategy:
  matrix:
    os: [ubuntu-22.04, macos-12]
    include:
      - os: ubuntu-22.04
        coverage: true
        upload-artifacts: true
      - os: macos-12
        coverage: false
        upload-artifacts: false

Only the Linux job runs coverage + uploads artifacts. Saves time + storage compared to running them on every OS.

When matrix is not the answer

Matrix is a hammer. Things it isn’t great for:

  • Truly different jobs. If your “matrix entries” need different steps, separate jobs read better.
  • Setup-heavy workflows where parallelism doesn’t help. Spinning up 4 runners that each spend 90% of time installing Docker is wasteful; one job with parallel inner workers is better.
  • Workflows where order matters. Matrix entries run in parallel by default. If you need entry A to finish before B starts, you’ve moved past matrix and into needs-based dependencies.

Real numbers

Our notifications service test suite went from 8 minutes (single job) to ~2.5 minutes (4 shards). The shards aren’t perfectly balanced because we split by line number, but the slowest shard sets the wall-clock and even the slowest is under 3 minutes.

For services where tests run < 3 minutes already, sharding isn’t worth the complexity. Pay the parallelism cost only when serial-time genuinely matters.

Common Pitfalls

Forgetting to scope path filters per matrix entry. If you have OS-specific tests, run them only on matching OS. runs-on: ${{ matrix.os }} + scoped if: clauses keep noise down.

Caching across matrix entries with the same key. Each matrix run is a separate execution; the cache action is fine with this. But if your key doesn’t include the matrix dimension, the first entry’s cache is read by the others — sometimes wrong (e.g., a Linux cache restored on macOS). Include ${{ matrix.os }} in cache keys.

Sharding tests where packages depend on order. Some test suites have hidden ordering dependencies (shared global state). Sharding will randomly fail. Fix the tests or don’t shard.

fail-fast: false on every matrix. Sounds nice; usually wastes minutes. Default true is right for most cases.

Matrix with include that re-specifies all params. include adds entries to the cross-product. To replace an entry, you have to define it identically in matrix and exclude others. Read the docs carefully; intuition fails here.

Treating GitHub’s job concurrency limits as unlimited. Free tier: 20 concurrent jobs across the whole account. A 6-shard test matrix on three branches × two services = past the limit. Pro/Team tiers raise it.

Wrapping Up

Matrix builds turn one job into N. Use them for multi-version, multi-OS, and test sharding when the test suite is too slow for serial. Pay attention to fail-fast and cache keys. Final post of the month (Mon Feb 28): deploying Docker images from GitHub Actions to staging — closing the loop from “code merged” to “running in staging.”