SBOMs That Are Actually Useful, Syft, CycloneDX 1.5, and the Limits of Static Analysis
TL;DR — Generate SBOMs at build time, not from running images, or you will miss build-time toolchains and includes / Pick CycloneDX 1.5 for security workflows, SPDX 2.3 for licence compliance — they answer different questions / Every generator misses something: statically-linked binaries, vendored Go modules, anything pulled in by a
RUN curl ....
The MOVEit breach in June was the clearest argument I have seen for SBOMs in years. Customers wanted to know in hours whether a particular library was in their stack. Teams without an SBOM spent days grepping through Dockerfiles. Teams with an SBOM had a query. The case is no longer controversial.
What still is controversial is what to put in the SBOM and how to generate it. The marketing makes it sound like a button. It is not. A useful SBOM requires opinions about the generation point, the format, and the gaps you are willing to live with. Here is the pipeline I run now, with Syft 0.90 and CycloneDX 1.5 as my defaults.
Generate at Build, Not at Runtime
You will see two patterns. Generate the SBOM by scanning the final image. Or generate it during the build, when you still have source and lockfiles. The first is convenient. The second is correct.
Scanning a built image gives you what is installed in the filesystem — packages from apk, apt, pip, npm, and so on. That is most of what you care about, but it misses:
- Build-time dependencies that produced binaries now sitting in
/usr/local/binwith no package metadata. - Statically-linked C/C++ libraries that were compiled in and left no trace.
- Vendored Go modules when the Go binary was built with
-trimpathor stripped. - Things downloaded by
curlin aRUNstep and dropped somewhere on disk.
Scanning at build time, when you have go.mod, package-lock.json, requirements.txt, and the actual source tree, recovers most of this. The price is integration with the build system — you cannot just run a single command after the fact.
I do both. A build-time SBOM as the source of truth, and a runtime image SBOM as a cross-check. When they disagree, that is the interesting signal.
CycloneDX or SPDX
Both formats describe the same thing — components, their versions, relationships, and metadata. Both have JSON and XML serialisations. They are not interchangeable.
CycloneDX 1.5 (released June 2023) was designed by the OWASP community for application security. It has first-class fields for vulnerabilities, services, and external references, and it integrates cleanly with Dependency-Track and security scanners.
SPDX 2.3 came out of the Linux Foundation and is biased toward licence and compliance. It is the format your legal team’s tool will speak. It has richer support for licence expressions and provenance fields.
I generate CycloneDX for security workflows and SPDX on demand when legal asks. Syft can emit both from a single source.
syft packages dir:. -o cyclonedx-json=sbom.cdx.json -o spdx-json=sbom.spdx.json
A Build Pipeline With Attested SBOMs
This is the GitHub Actions job. It builds, generates the SBOM from source, signs the image, attaches the SBOM as a signed attestation, and uploads it to Dependency-Track for vulnerability tracking over time.
name: build-sbom-sign
on:
push:
branches: [main]
jobs:
publish:
runs-on: ubuntu-22.04
permissions:
contents: read
packages: write
id-token: write
steps:
- uses: actions/checkout@v4
- uses: anchore/sbom-action/download-syft@v0.14.3
with:
syft-version: v0.90.0
- uses: sigstore/cosign-installer@v3.1.2
with:
cosign-release: v2.2.0
- name: Generate source SBOM
run: |
syft packages dir:. \
-o cyclonedx-json=sbom.cdx.json \
--source-name "${{ github.repository }}" \
--source-version "${{ github.sha }}"
- name: Build and push
id: build
run: |
IMAGE=ghcr.io/${{ github.repository }}:${{ github.sha }}
docker build -t "$IMAGE" .
docker push "$IMAGE"
DIGEST=$(docker inspect --format='{{index .RepoDigests 0}}' "$IMAGE")
echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
- name: Generate image SBOM
run: |
syft "${{ steps.build.outputs.digest }}" \
-o cyclonedx-json=sbom.image.cdx.json
- name: Sign image
run: cosign sign --yes "${{ steps.build.outputs.digest }}"
- name: Attest SBOMs
run: |
cosign attest --yes --type cyclonedx \
--predicate sbom.cdx.json \
"${{ steps.build.outputs.digest }}"
cosign attest --yes --type cyclonedx \
--predicate sbom.image.cdx.json \
"${{ steps.build.outputs.digest }}"
- name: Upload to Dependency-Track
env:
DT_API_KEY: ${{ secrets.DEPENDENCY_TRACK_KEY }}
run: |
curl -X POST https://dt.internal/api/v1/bom \
-H "X-API-Key: $DT_API_KEY" \
-F "project=${{ secrets.DT_PROJECT_UUID }}" \
-F "bom=@sbom.cdx.json"
The two SBOMs (source and image) are intentional. Dependency-Track does not mind having both; it deduplicates components. The disagreement between them, when it shows up, is what tells you something interesting is happening in the build.
What Syft Misses
I have spent more time than I want chasing missing components. The honest list:
Statically-linked C dependencies. A Go binary that links libfoo via cgo with -linkmode external shows up to Syft as a Go binary with no record of libfoo. The Go-binary scanner reads the Go module table, not the linker’s symbol table. Fix: feed the source-tree SBOM as well, where go.mod lists what you actually built against.
Vendored modules in stripped binaries. Same family of problem. go build -trimpath -ldflags="-s -w" removes the module table from the binary. Syft can no longer recover the dependency list at all. Build-time SBOM mandatory.
Things added by RUN curl ... | sh. Tools installed this way leave no package manager record. Syft sees the binary on disk but cannot identify it. The fix is build hygiene — pin tool downloads to a known version and record them explicitly, or use a package manager.
Rust crates and Python wheels with native extensions. The Rust scanner is good when Cargo.lock is present. It is blind to binaries built externally and copied in. Python wheels with bundled .so files report the wheel but not the embedded native library.
Browser bundles. A webpack bundle in a dist/ directory is one file to Syft. The fact that it contains 400 npm packages is invisible. Generate the SBOM from package-lock.json before bundling, not from dist/.
None of these are Syft’s fault. They are limits of static analysis. The pattern is the same: when the toolchain destroys information, scan earlier in the build where the information still exists.
Querying the SBOM
An SBOM that nobody queries is a file on disk. The query layer matters more than the format. Dependency-Track ingests CycloneDX, correlates against the NVD and OSS Index, and notifies on new CVEs against components already in production. It is the cheapest way I know to get a running view of “what is exploitable today”.
For one-off queries against an SBOM file, cyclonedx-cli is fine:
cyclonedx-cli search --input-file sbom.cdx.json --query "name:log4j*"
For querying SBOMs attached as Cosign attestations, you pull them down and pipe:
cosign download attestation \
--predicate-type https://cyclonedx.org/bom \
ghcr.io/myorg/app@sha256:abc... \
| jq -r '.payload | @base64d | fromjson | .predicate' \
| jq '.components[] | select(.name | test("openssl"))'
I keep this as a snippet. When a new CVE drops, it answers “are we affected” in 30 seconds across all signed images.
Common Pitfalls
Treating SBOM as a deliverable, not a workflow. Generating an SBOM at release time and emailing it to a regulator is compliance. It does not help you when a CVE drops on a Tuesday. The workflow is continuous: generate, attest, ingest into a vulnerability tracker, alert on new findings.
Forgetting transitive dependencies. Some generators stop at direct dependencies. CycloneDX has a dependencies graph that captures the full tree. Make sure your generator emits it. Syft does for ecosystems with proper lockfiles.
Mixing formats in the pipeline. Pick one canonical format internally. Convert at the boundary if a tool needs the other. Mixing CycloneDX and SPDX in your storage layer means every query has to handle both.
Versions in component identifiers. CycloneDX 1.5 uses purl (Package URL) for component identity. pkg:npm/lodash@4.17.21 is unambiguous. lodash 4.17.21 is not — there are forks. Insist on purl-based identifiers.
Wrapping Up
The SBOM is plumbing. It is only useful when it is fed automatically from the build, signed alongside the image, and queryable when something burns. Get the generation point right (build-time, from source), accept that static analysis has limits, and wire the output into a vulnerability tracker. The next bit of plumbing in this series is how to keep the secrets these services depend on out of the images entirely — covered in the post on Vault dynamic secrets.