Secrets Scanning in 2024, TruffleHog and Gitleaks in CI
TL;DR — Use both scanners. TruffleHog v3.82 for live-credential verification, Gitleaks 8.18 for fast pre-receive blocking. Tune the patterns, not the noise budget.
I run secrets scanning for a couple of repos with hundreds of contributors. After many false starts, the configuration I’m happiest with uses TruffleHog and Gitleaks together, each playing to its strength. TruffleHog’s killer feature is its verifier — it can take a candidate token and actually try to authenticate against the upstream service, so you know whether the leaked secret is live or already rotated. Gitleaks is fast, regex-driven, and unbeatable as a pre-receive guardrail.
Most teams I talk to are running one or the other, badly configured, and have given up on the result because the false-positive rate makes alerts meaningless. That outcome is avoidable. The cost is a few hours of tuning and a small change in how you think about the scanner’s role.
This post is what I’d hand a colleague spinning this up from scratch. Concrete config, concrete commands, concrete pitfalls.
Two Scanners, Different Jobs
The conceptual split I use:
- Gitleaks runs at three places: pre-commit hook, pre-receive hook on the Git server, and CI on every push. Its job is to block known credential shapes before they enter the repo. Fast, deterministic, regex-based. Good signal-to-noise on common cloud provider tokens.
- TruffleHog runs as a scheduled scan against the full git history daily, and as a CI step that runs verifier on the diff. Its job is to find live credentials, including custom ones, and to confirm whether matches are real. Slower, but much higher confidence on hits.
Why both? Because a regex scanner will miss high-entropy custom tokens (your internal API keys, GraphQL service keys, signed JWTs) and a verifier-based scanner is too slow to gate every push. Use the right tool at the right point in the pipeline.
Pre-Receive: The One That Actually Matters
If a secret reaches your default branch, you’ve lost. Rotation is mandatory and rotation across distributed services is painful. Pre-receive enforcement is the single highest-leverage place to put your effort.
A pre-receive hook on the Git server runs before the push is accepted. Gitleaks 8.18 is fast enough to run on every push without making developers hate you. The catch: pre-receive runs in a sparse environment and won’t have your normal CI conveniences. Wrap it.
#!/usr/bin/env bash
# pre-receive hook, runs as Git server user
set -euo pipefail
while read -r old_rev new_rev ref; do
# Skip deletes and new branch creations from main
[[ "$new_rev" == "0000000000000000000000000000000000000000" ]] && continue
# Scan only the new commits
range="${old_rev}..${new_rev}"
if ! gitleaks detect --no-banner --redact \
--config /etc/gitleaks/config.toml \
--log-opts "${range}" \
--report-format json \
--report-path /tmp/gitleaks-$$.json; then
echo ""
echo "Push rejected: secrets detected."
echo "Review the findings, rotate the credential, and rewrite history."
jq -r '.[] | " - \(.RuleID) at \(.File):\(.StartLine)"' \
/tmp/gitleaks-$$.json
exit 1
fi
done
A few non-obvious bits. The --log-opts "${range}" argument restricts scanning to commits in the push, not the entire repo. Without it, you’d rescan history on every push and grind the server to dust. The --redact flag prevents the matched secret from appearing in the rejection message itself, which is important because pre-receive output is often logged.
If you can’t deploy server-side hooks (managed Git hosting), the GitHub-native push protection or the equivalent on your provider is the closest substitute. Combine it with required CI checks on PRs.
Gitleaks Config That Doesn’t Drown You
Default rules are a decent starting point but cause noise on most real codebases. The two pieces I always tune:
- Allowlist for known-safe paths. Test fixtures, vendored dependencies, sample data.
- Custom rules for your internal tokens. Mine usually include three to five patterns for our own services.
# /etc/gitleaks/config.toml
title = "Gitleaks 8.18 config"
[extend]
useDefault = true
[[rules]]
id = "internal-api-key"
description = "Internal service API key"
regex = '''sk_int_(?:live|test)_[A-Za-z0-9]{32,64}'''
entropy = 3.5
keywords = ["sk_int_"]
[[allowlists]]
description = "Test fixtures with deliberately invalid tokens"
paths = [
'''(.*?)(test|tests|fixtures|examples)/.*?\.(json|yaml|yml)$''',
]
regexTarget = "match"
[[allowlists]]
description = "Documentation snippets"
paths = ['''docs/.*\.md$''']
The entropy parameter on custom rules is critical. Without it, the regex alone matches placeholders like sk_int_test_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX in docs. With it, only high-entropy actual tokens trigger.
TruffleHog and the Verifier
Here’s where TruffleHog v3.82 earns its keep. The --only-verified flag restricts output to matches that the tool successfully authenticated against the upstream service. A verified AWS key isn’t a maybe — it’s a live credential that needs to be rotated this hour.
# Daily full-history scan, posts only verified hits to security channel
trufflehog git \
--since-commit "$(git rev-parse HEAD~$(git rev-list --count --since='1 day ago' HEAD))" \
file://. \
--only-verified \
--json \
--no-update | tee /tmp/th-output.jsonl
For CI, run TruffleHog scoped to the PR diff:
- name: TruffleHog diff scan
uses: trufflesecurity/trufflehog@v3.82
with:
extra_args: --only-verified --fail
base: ${{ github.event.pull_request.base.sha }}
head: ${{ github.event.pull_request.head.sha }}
The verifier covers a long list of common services. For custom credentials you can write a custom detector that includes a verification HTTP call. I’ve written verifiers for our internal token format, and the false-positive rate dropped to effectively zero.
One subtle thing: verifiers make outbound HTTP calls. If your CI runs in a network-restricted environment, the verifier silently fails closed and you get fewer verified hits than you should. Either allowlist the verifier endpoints or accept that unverified scanning is your fallback.
What To Do When You Find Something
The discovery is the easy part. The response is what determines whether your program is credible.
- Rotate immediately. Not “after we figure out who committed it.” Rotate first, ask questions second. The exposure window starts when the commit landed, not when you noticed.
- Audit usage. Pull logs from the upstream service for any auth using the leaked credential since the commit date. CloudTrail for AWS, equivalent for whichever provider.
- Notify the developer privately. Public Slack call-outs make people hide future incidents. Private conversation, blameless tone, walk them through the rotation.
- Rewrite history only after rotation. A force-push to remove the secret from git history is useful for compliance but worthless from a security standpoint once the secret has been on any public-facing server. Rotate, then clean.
- Add a regression rule. If the same shape leaked once, add a specific Gitleaks rule for that token format so it never sneaks past again.
I tied this into the broader auto-remediation for cloud security findings workflow at one job. Verified TruffleHog hits would automatically open a high-severity ticket, trigger rotation in the provider via Lambda, and leave a comment on the offending commit. The structure works; the cultural piece is the harder lift.
Performance and Caching
A common complaint: “the scanner is slow.” Usually fixable.
- Gitleaks: the bottleneck is git log on large repos. Use
--log-optsto constrain scope. Don’t scan from--allin CI. - TruffleHog: the bottleneck is the verifier HTTP calls. If you don’t need verification on every CI run, drop
--only-verifiedand let the scheduled job catch them. - Both: ignore binary blobs, generated lockfiles, and vendored directories explicitly. The default heuristics are okay but not optimal.
For the daily full-history TruffleHog scan, I cache the last scanned commit SHA and only scan the new range. A full history scan of a multi-gigabyte monorepo takes hours. The incremental version takes minutes.
Gotchas
- Test fixtures with realistic-looking secrets. Use clearly-fake patterns (
AKIAIOSFODNN7EXAMPLEis the AWS-blessed example) and exempt the test directories explicitly. The temptation to use real-looking strings is real and dangerous. - Encrypted secrets in
.env.encryptedor sealed-secrets. These are not secrets; they’re ciphertext. But high-entropy strings will trip the scanner. Add them to the allowlist. - JWT in test files. A test JWT signed with a known test key is not a leak. The scanner often disagrees. Suppress with allowlist, not by deleting the test.
- Submodule scans. TruffleHog and Gitleaks both have edge cases with submodules. Run them on each module separately if you depend on the result.
- Pre-commit hooks can be skipped.
git commit --no-verifyexists. Pre-commit is for convenience; pre-receive is for enforcement. - Forgot to scan branches before they merge. Some teams scan main only. Attacker pushes to a feature branch, opens PR, deletes branch. Repo history still has the commit. Always scan all refs on push.
Wrapping Up
Secrets scanning isn’t sexy. It’s plumbing. But it’s the kind of plumbing that prevents the calls at 2am asking why the AWS bill went up by $40,000 yesterday. The investment in tuning Gitleaks and adding a verifier-driven TruffleHog pass is small relative to the cost of even one incident.
The single biggest improvement you can make today: pre-receive enforcement with Gitleaks. The single biggest cultural improvement: making rotation the default reflex instead of a debate. Get those two things, and you’ve already outpaced most of the industry.