Multi-Stage Docker Builds Cut Our Image Size by 70%
TL;DR — Multi-stage builds compile in a fat image and copy the artifact to a thin one. Our Go service went from 980 MB to 14 MB. Distroless wins for Go and Java; Alpine wins for Python and PHP. Buildx cache-from in CI cuts build time from minutes to seconds.
Two weeks ago our api image was 980 MB. The exact same binary, in a properly multi-staged image, is 14 MB. That’s a 70× reduction, not 70% — I rounded down for the title because nobody believes 70× until they see the receipts.
This isn’t a clever trick. It’s the standard Docker pattern that’s been available since 17.05 in 2017. What changed in 2022 is the tooling around it: Buildx is the default builder, BuildKit’s cache-from works reliably across registries, and distroless / Alpine images have hit a level of polish where they’re a viable default rather than a gotcha-laden curiosity.
This post walks through the multi-stage pattern with real Dockerfiles for Go and Node, the trade-offs between Alpine, distroless, and scratch, and the CI cache configuration that makes the whole thing fast.
Why image size matters in 2022
Three concrete reasons, in priority order.
Cold-start latency. Pulling a 1 GB image to a fresh node takes 30–90 seconds depending on your registry latency. Pulling 14 MB takes 1–2 seconds. If you’re running anything autoscaling — Kubernetes HPA, ECS auto-scaling, serverless container platforms — image pull is the dominant component of cold-start time. A 14 MB image scales horizontally in a way a 1 GB image cannot.
Attack surface. Every binary in your image is a potential CVE. The standard node:18 image has Bash, OpenSSL, curl, wget, apt, dpkg, GCC libs, Python (yes, really), and a couple hundred other things. Distroless has libc, your runtime, and your binary. Trivy scan results go from “47 vulnerabilities” to “2 vulnerabilities” overnight, and the two left are usually in libc itself.
Storage and bandwidth cost. Image registries charge per GB stored and per GB egressed. Multiply by your CI run frequency and your node count. The savings are not life-changing but they’re not zero either.
The size argument is also a proxy for a subtler one: a small image is one you understand. You know exactly what’s in it. You can audit the layers in 30 seconds. That clarity is its own win.
A 4-line fix (multi-stage Go example)
Here’s the before/after for our Go service. Before — naive single-stage:
FROM golang:1.17
WORKDIR /app
COPY . .
RUN go build -o /app/bin/server ./cmd/server
CMD ["/app/bin/server"]
That image: 980 MB. The golang:1.17 base is 940 MB before you add a single line of your code. It includes the entire Go toolchain — compiler, linker, stdlib source, git, mercurial, build-essential, the works. None of which your running server needs.
After — multi-stage with distroless runtime:
# syntax=docker/dockerfile:1.4
FROM golang:1.17-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
--mount=type=cache,target=/go/pkg/mod \
CGO_ENABLED=0 GOOS=linux go build \
-ldflags="-s -w" \
-trimpath \
-o /out/server ./cmd/server
FROM gcr.io/distroless/static-debian11:nonroot
COPY --from=build /out/server /server
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]
That image: 14 MB. The distroless/static base is ~2 MB. The Go binary, statically linked with CGO_ENABLED=0, is ~12 MB. Nothing else.
Three things doing the work:
CGO_ENABLED=0produces a fully static binary with no glibc/musl dependency. That’s what lets us usedistroless/staticinstead ofdistroless/base.-ldflags="-s -w"strips DWARF debug info and the symbol table. About 25% binary size win.-trimpathremoves filesystem paths from the binary, which is a small size win and a meaningful security/reproducibility win.
If you’re not using cgo, the static binary path is the right one. If you are using cgo (sqlite driver, certain image libs), use distroless/base-debian11 instead — it gets you to ~25 MB, still ~40× smaller than the naive build.
Pushing further: distroless, Alpine, scratch
Three runtime image families to know.
scratch. The literal empty image. Zero bytes. Only useful if your binary is statically linked and doesn’t need a CA bundle, timezone data, or /etc/passwd. In practice, almost everyone needs at least the CA bundle for outbound HTTPS, which means scratch is impractical for any service that calls an external API.
distroless (Google’s family). distroless/static, distroless/base, distroless/cc, distroless/java, distroless/python3, distroless/nodejs. Each contains exactly what its runtime needs and nothing else. No shell, no package manager, no curl. The nonroot variants ship with a nonroot user pre-created.
The catch with distroless: no shell means no docker exec -it container sh for debugging. You debug via logs, tracing, and kubectl debug ephemeral containers. For me that’s a feature. For some teams it’s a friction point worth knowing about.
Alpine. alpine:3.15, plus runtime-specific tags like node:18-alpine, python:3.10-alpine, php:8.0-fpm-alpine. About 5–8 MB base. Has a shell, has apk for installing extras, has musl libc instead of glibc. The musl difference matters for Python (some C extensions struggle), occasionally for Node native modules, rarely for Go.
My defaults:
- Go →
distroless/static:nonroot - Java →
distroless/java17 - Node →
node:18-alpine(distroless/nodejs is fine but Alpine is more flexible fornpm installquirks) - Python →
python:3.10-slim(not Alpine — too many wheel issues with musl) - PHP →
php:8.0-fpm-alpine3.15
Buildx cache layers in CI
Multi-stage gets you the size win. Buildx cache gets you the time win.
Without cache, every CI build re-downloads dependencies, re-compiles, re-installs apt packages. Painful. With cache, only the layers affected by your actual code change rebuild.
GitHub Actions example:
- uses: docker/setup-buildx-action@v1
- uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v2
with:
context: .
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=registry,ref=ghcr.io/${{ github.repository }}:buildcache
cache-to: type=registry,ref=ghcr.io/${{ github.repository }}:buildcache,mode=max
platforms: linux/amd64,linux/arm64
mode=max is the important one — it caches all layers including intermediate stages, not just the final image. That’s how the go mod download layer and the apt-install layer get cached across runs even though they don’t end up in the final image.
The first run still takes 4 minutes. The second run, with no dependency changes, finishes in 35 seconds. After that you barely notice the build.
Common Pitfalls
Caching defeats itself if you COPY . . early. The whole point of separating COPY go.mod go.sum from COPY . . is that the dependency layer doesn’t invalidate when your source changes. If you copy everything at once, every code change re-downloads dependencies. Painful for Go, excruciating for Node and PHP.
--mount=type=cache only works inside a single build context. It doesn’t carry across CI runs the way layer cache does. For cross-run caching you also need cache-from/cache-to on the build itself.
Distroless tags drift. gcr.io/distroless/static:nonroot is a moving tag. For reproducibility, pin the digest: gcr.io/distroless/static:nonroot@sha256:abc123.... Tooling like Renovate can keep it updated for you.
Alpine wheel install hell for Python. If you pip install pandas on Alpine you’ll spend ten minutes watching it compile NumPy from source. Use python:3.10-slim (Debian-based but minimal). The size cost is ~40 MB; the time cost is your sanity.
Forgetting CA certificates in scratch. If you ever switch to scratch instead of distroless/static, you need to COPY --from=alpine:3.15 /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ or your outbound HTTPS will fail with “x509: certificate signed by unknown authority.”
Wrapping Up
Multi-stage builds are the lowest-effort, highest-impact change you can make to a Dockerfile in 2022. Pair it with a sensible runtime base (distroless for Go/Java, Alpine for PHP, slim for Python) and Buildx registry cache and you’ll cut both image size and build time by an order of magnitude. Next post in this series shifts from images to orchestration: the first Go microservice coming out of the monolith.