Managing Grafana Dashboards as Code with Terraform
TL;DR — Hand-edited dashboards drift and die with the person who built them / Grafonnet generates the JSON, Terraform applies it, Git reviews it / One library function per panel kills copy-paste across twenty dashboards.
I once spent an afternoon trying to figure out why staging and production had subtly different latency dashboards. Same service, same metrics, panels that looked identical at a glance — but the production one used a 5-minute rate window and staging used 1 minute, so the two graphs never agreed during an incident. Nobody had decided that. Someone had clicked “edit panel” eighteen months earlier and never saved it back anywhere.
That is dashboard drift, and it’s the default state of any Grafana instance that people edit through the UI. The dashboard is a critical operational artifact, yet it lives only in Grafana’s database, has no history, no review, and no reproducibility. Treating Grafana dashboards as code fixes all four problems at once.
This post builds a real workflow: Grafonnet generates dashboard JSON from a typed Jsonnet library, and Terraform 1.10 applies it to Grafana 11 through the official provider. You get version control, pull-request review, and dashboards that are identical across every environment because they’re rendered from the same source. If your dashboards are alerting dashboards, this pairs directly with building real-time alerting dashboards with Prometheus and Grafana .
Why Not Just Commit the JSON
The obvious first move is to export each dashboard’s JSON and commit it. That’s better than nothing, but the raw JSON model is a poor source of truth. It’s verbose, it carries volatile fields like id and version that churn on every export, and it has zero abstraction — twenty dashboards means twenty hand-maintained copies of the same latency panel.
Grafonnet solves the abstraction problem. It’s a Jsonnet library from the Grafana team that exposes dashboards, panels, and queries as composable functions. You write a panel once, parameterize it, and reuse it everywhere. Jsonnet renders it to canonical JSON, and Terraform owns the apply. Each tool does one job.
Project Layout
grafana-iac/
├── lib/
│ └── panels.libsonnet # reusable panel functions
├── dashboards/
│ └── search-api.jsonnet # one file per dashboard
├── jsonnetfile.json # jsonnet-bundler manifest
├── main.tf
├── variables.tf
└── Makefile
Pull in Grafonnet with jsonnet-bundler so the version is pinned and reproducible:
# install the toolchain
go install github.com/google/go-jsonnet/cmd/[email protected]
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/[email protected]
# initialise and vendor Grafonnet
jb init
jb install github.com/grafana/grafonnet/gen/grafonnet-v11.0@main
A Reusable Panel Library
This is where dashboards-as-code earns its keep. Define each panel type once in lib/panels.libsonnet as a function, and every dashboard calls it with parameters. Change the function and every dashboard updates on the next apply.
// lib/panels.libsonnet
local g = import 'github.com/grafana/grafonnet/gen/grafonnet-v11.0/main.libsonnet';
local ts = g.panel.timeSeries;
local prq = g.query.prometheus;
{
// A latency panel parameterized by metric, quantile, and SLO threshold.
latency(title, metric, quantile=0.99, slo=null)::
ts.new(title)
+ ts.queryOptions.withTargets([
prq.new(
'$datasource',
'histogram_quantile(%g, sum by (le, route) (rate(%s_bucket[5m])))'
% [quantile, metric],
)
+ prq.withLegendFormat('{{route}}'),
])
+ ts.standardOptions.withUnit('s')
+ (if slo != null then
ts.standardOptions.thresholds.withSteps([
g.panel.timeSeries.standardOptions.threshold.step.withColor('green'),
g.panel.timeSeries.standardOptions.threshold.step.withColor('red')
+ g.panel.timeSeries.standardOptions.threshold.step.withValue(slo),
])
else {}),
// An error-ratio panel with a fixed percentunit format.
errorRatio(title, metric)::
ts.new(title)
+ ts.queryOptions.withTargets([
prq.new(
'$datasource',
'sum by (route) (rate(%s{status=~"5.."}[5m]))'
% metric +
' / clamp_min(sum by (route) (rate(%s[5m])), 1e-9)' % metric,
)
+ prq.withLegendFormat('{{route}}'),
])
+ ts.standardOptions.withUnit('percentunit'),
}
The clamp_min in errorRatio is baked into the library, so no dashboard can ever ship the divide-by-zero bug. That’s the real payoff: fixes propagate.
Composing a Dashboard
The dashboard file is now short and declarative — it lays out panels from the library and sets the grid.
// dashboards/search-api.jsonnet
local g = import 'github.com/grafana/grafonnet/gen/grafonnet-v11.0/main.libsonnet';
local panels = import '../lib/panels.libsonnet';
local datasource =
g.dashboard.variable.datasource.new('datasource', 'prometheus')
+ g.dashboard.variable.datasource.generalOptions.withCurrent('Prometheus');
g.dashboard.new('Search API — Service Health')
+ g.dashboard.withUid('search-api-health')
+ g.dashboard.withTags(['search', 'managed-by-terraform'])
+ g.dashboard.withRefresh('30s')
+ g.dashboard.time.withFrom('now-6h')
+ g.dashboard.withVariables([datasource])
+ g.dashboard.withPanels(
g.util.grid.makeGrid([
panels.latency('p99 latency', 'http_request_duration_seconds', 0.99, slo=0.5)
+ g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(8),
panels.errorRatio('Error ratio', 'http_requests_total')
+ g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(8),
], panelWidth=12, panelHeight=8)
)
Render it to JSON to confirm it compiles before Terraform ever runs:
jsonnet -J vendor dashboards/search-api.jsonnet > /tmp/search-api.json
Wiring Terraform
Now Terraform applies the rendered dashboard. The Grafana Terraform provider
reads JSON via the grafana_dashboard resource, and the jsonnet_dir data source from the jsonnet provider compiles the Jsonnet inside the plan — so a syntax error fails terraform plan, not production.
# main.tf — Terraform 1.10
terraform {
required_version = ">= 1.10.0"
required_providers {
grafana = {
source = "grafana/grafana"
version = "~> 3.18"
}
jsonnet = {
source = "alxrem/jsonnet"
version = "~> 2.3"
}
}
backend "s3" {
bucket = "acme-tf-state"
key = "grafana-iac/terraform.tfstate"
region = "ap-southeast-1"
}
}
provider "grafana" {
url = var.grafana_url
auth = var.grafana_service_account_token
}
provider "jsonnet" {
jsonnet_path = "${path.module}/vendor"
}
# Compile every dashboard in dashboards/.
locals {
dashboard_files = fileset("${path.module}/dashboards", "*.jsonnet")
}
data "jsonnet_file" "dashboard" {
for_each = local.dashboard_files
source = "${path.module}/dashboards/${each.value}"
}
resource "grafana_dashboard" "managed" {
for_each = data.jsonnet_file.dashboard
config_json = each.value.rendered
overwrite = true
}
# variables.tf
variable "grafana_url" {
type = string
description = "Base URL of the Grafana instance."
}
variable "grafana_service_account_token" {
type = string
description = "Service account token with dashboard write scope."
sensitive = true
}
The overwrite = true is essential. It tells Grafana to replace a dashboard with the same UID even if someone edited it in the UI — Terraform reasserts the source of truth on every apply, which is exactly the drift cure you want.
The CI Pipeline
Tie it together so every change goes through review. This GitHub Actions workflow renders, plans on PRs, and applies on merge.
# .github/workflows/grafana.yml
name: grafana-iac
on:
pull_request:
paths: ['grafana-iac/**']
push:
branches: [main]
paths: ['grafana-iac/**']
jobs:
terraform:
runs-on: ubuntu-24.04
defaults:
run:
working-directory: grafana-iac
env:
TF_VAR_grafana_url: ${{ secrets.GRAFANA_URL }}
TF_VAR_grafana_service_account_token: ${{ secrets.GRAFANA_SA_TOKEN }}
steps:
- uses: actions/checkout@v4
- name: Install jsonnet toolchain
run: |
go install github.com/google/go-jsonnet/cmd/[email protected]
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/[email protected]
echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH"
- name: Vendor Grafonnet
run: jb install
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.10.5
- run: terraform init
- run: terraform plan -no-color
if: github.event_name == 'pull_request'
- run: terraform apply -auto-approve
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
A pull request now shows a Terraform plan diff of the actual dashboard JSON. A reviewer can see that a rate window changed from 5m to 1m before it ships — the exact mistake that bit me on staging.
Common Pitfalls
- Committing raw exported JSON. It carries volatile
id/versionfields that produce noisy diffs and a constant Terraform plan churn. Render from Jsonnet instead. - Omitting
withUid. Without a stable UID, Terraform creates a new dashboard on every apply instead of updating in place. Always set an explicit UID. - Skipping
overwrite = true. Without it, a UI edit blocks the next Terraform apply and drift wins. - Editing managed dashboards in the UI. Add a
managed-by-terraformtag and a panel note so nobody is surprised when their change vanishes on the next apply. - Unpinned Grafonnet.
jb install ...@mainwithout a lockfile means a dashboard can change shape on a CI runner. Commitjsonnetfile.lock.json. - Storing the SA token in
.tfvars. Use CI secrets and a remote backend with state encryption.
Troubleshooting
Symptom: terraform apply creates a duplicate dashboard each run.
Cause: The dashboard has no stable UID, so Grafana treats every apply as new.
Fix: Add g.dashboard.withUid('...') to the Jsonnet and import the existing dashboard once with terraform import.
Symptom: Plan shows a constant diff on config_json even with no changes.
Cause: Volatile fields (version, iteration) or non-canonical key ordering.
Fix: Render through Jsonnet, which emits canonical output, and never set version manually.
Symptom: jsonnet_file data source fails with “RUNTIME ERROR: couldn’t open import”.
Cause: Grafonnet wasn’t vendored, or jsonnet_path doesn’t point at vendor.
Fix: Run jb install before terraform plan and set jsonnet_path = "${path.module}/vendor" in the provider block.
Symptom: Apply fails with HTTP 403 from Grafana.
Cause: The service account token lacks dashboard write permission.
Fix: Grant the service account the Editor role or a custom role with dashboards:write, then regenerate the token.
Wrapping Up
Grafana dashboards as code with Terraform turns an invisible, drift-prone artifact into a reviewed, reproducible one: Grafonnet for typed reuse, Jsonnet for canonical JSON, Terraform for the apply, and Git for history. The first dashboard takes an afternoon; every dashboard after that is a few lines because the panel library does the work. Next, fold your alert rules into the same pipeline so alerting and visualization ship together.