background-shape
Grafana article cover illustration on a gradient background
March 24, 2026 · 7 min read · by Muhammad Amal programming
Advertisement

TL;DR — Hand-edited dashboards drift and die with the person who built them / Grafonnet generates the JSON, Terraform applies it, Git reviews it / One library function per panel kills copy-paste across twenty dashboards.

I once spent an afternoon trying to figure out why staging and production had subtly different latency dashboards. Same service, same metrics, panels that looked identical at a glance — but the production one used a 5-minute rate window and staging used 1 minute, so the two graphs never agreed during an incident. Nobody had decided that. Someone had clicked “edit panel” eighteen months earlier and never saved it back anywhere.

That is dashboard drift, and it’s the default state of any Grafana instance that people edit through the UI. The dashboard is a critical operational artifact, yet it lives only in Grafana’s database, has no history, no review, and no reproducibility. Treating Grafana dashboards as code fixes all four problems at once.

Advertisement

This post builds a real workflow: Grafonnet generates dashboard JSON from a typed Jsonnet library, and Terraform 1.10 applies it to Grafana 11 through the official provider. You get version control, pull-request review, and dashboards that are identical across every environment because they’re rendered from the same source. If your dashboards are alerting dashboards, this pairs directly with building real-time alerting dashboards with Prometheus and Grafana .

Why Not Just Commit the JSON

The obvious first move is to export each dashboard’s JSON and commit it. That’s better than nothing, but the raw JSON model is a poor source of truth. It’s verbose, it carries volatile fields like id and version that churn on every export, and it has zero abstraction — twenty dashboards means twenty hand-maintained copies of the same latency panel.

Grafonnet solves the abstraction problem. It’s a Jsonnet library from the Grafana team that exposes dashboards, panels, and queries as composable functions. You write a panel once, parameterize it, and reuse it everywhere. Jsonnet renders it to canonical JSON, and Terraform owns the apply. Each tool does one job.

Project Layout

grafana-iac/
├── lib/
│   └── panels.libsonnet      # reusable panel functions
├── dashboards/
│   └── search-api.jsonnet    # one file per dashboard
├── jsonnetfile.json          # jsonnet-bundler manifest
├── main.tf
├── variables.tf
└── Makefile

Pull in Grafonnet with jsonnet-bundler so the version is pinned and reproducible:

# install the toolchain
go install github.com/google/go-jsonnet/cmd/[email protected]
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/[email protected]

# initialise and vendor Grafonnet
jb init
jb install github.com/grafana/grafonnet/gen/grafonnet-v11.0@main

A Reusable Panel Library

This is where dashboards-as-code earns its keep. Define each panel type once in lib/panels.libsonnet as a function, and every dashboard calls it with parameters. Change the function and every dashboard updates on the next apply.

// lib/panels.libsonnet
local g = import 'github.com/grafana/grafonnet/gen/grafonnet-v11.0/main.libsonnet';
local ts = g.panel.timeSeries;
local prq = g.query.prometheus;

{
  // A latency panel parameterized by metric, quantile, and SLO threshold.
  latency(title, metric, quantile=0.99, slo=null)::
    ts.new(title)
    + ts.queryOptions.withTargets([
        prq.new(
          '$datasource',
          'histogram_quantile(%g, sum by (le, route) (rate(%s_bucket[5m])))'
          % [quantile, metric],
        )
        + prq.withLegendFormat('{{route}}'),
      ])
    + ts.standardOptions.withUnit('s')
    + (if slo != null then
         ts.standardOptions.thresholds.withSteps([
           g.panel.timeSeries.standardOptions.threshold.step.withColor('green'),
           g.panel.timeSeries.standardOptions.threshold.step.withColor('red')
           + g.panel.timeSeries.standardOptions.threshold.step.withValue(slo),
         ])
       else {}),

  // An error-ratio panel with a fixed percentunit format.
  errorRatio(title, metric)::
    ts.new(title)
    + ts.queryOptions.withTargets([
        prq.new(
          '$datasource',
          'sum by (route) (rate(%s{status=~"5.."}[5m]))'
          % metric +
          ' / clamp_min(sum by (route) (rate(%s[5m])), 1e-9)' % metric,
        )
        + prq.withLegendFormat('{{route}}'),
      ])
    + ts.standardOptions.withUnit('percentunit'),
}

The clamp_min in errorRatio is baked into the library, so no dashboard can ever ship the divide-by-zero bug. That’s the real payoff: fixes propagate.

Composing a Dashboard

The dashboard file is now short and declarative — it lays out panels from the library and sets the grid.

// dashboards/search-api.jsonnet
local g = import 'github.com/grafana/grafonnet/gen/grafonnet-v11.0/main.libsonnet';
local panels = import '../lib/panels.libsonnet';

local datasource =
  g.dashboard.variable.datasource.new('datasource', 'prometheus')
  + g.dashboard.variable.datasource.generalOptions.withCurrent('Prometheus');

g.dashboard.new('Search API — Service Health')
+ g.dashboard.withUid('search-api-health')
+ g.dashboard.withTags(['search', 'managed-by-terraform'])
+ g.dashboard.withRefresh('30s')
+ g.dashboard.time.withFrom('now-6h')
+ g.dashboard.withVariables([datasource])
+ g.dashboard.withPanels(
    g.util.grid.makeGrid([
      panels.latency('p99 latency', 'http_request_duration_seconds', 0.99, slo=0.5)
      + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(8),
      panels.errorRatio('Error ratio', 'http_requests_total')
      + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(8),
    ], panelWidth=12, panelHeight=8)
  )

Render it to JSON to confirm it compiles before Terraform ever runs:

jsonnet -J vendor dashboards/search-api.jsonnet > /tmp/search-api.json

Wiring Terraform

Now Terraform applies the rendered dashboard. The Grafana Terraform provider reads JSON via the grafana_dashboard resource, and the jsonnet_dir data source from the jsonnet provider compiles the Jsonnet inside the plan — so a syntax error fails terraform plan, not production.

# main.tf — Terraform 1.10
terraform {
  required_version = ">= 1.10.0"
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = "~> 3.18"
    }
    jsonnet = {
      source  = "alxrem/jsonnet"
      version = "~> 2.3"
    }
  }
  backend "s3" {
    bucket = "acme-tf-state"
    key    = "grafana-iac/terraform.tfstate"
    region = "ap-southeast-1"
  }
}

provider "grafana" {
  url  = var.grafana_url
  auth = var.grafana_service_account_token
}

provider "jsonnet" {
  jsonnet_path = "${path.module}/vendor"
}

# Compile every dashboard in dashboards/.
locals {
  dashboard_files = fileset("${path.module}/dashboards", "*.jsonnet")
}

data "jsonnet_file" "dashboard" {
  for_each = local.dashboard_files
  source   = "${path.module}/dashboards/${each.value}"
}

resource "grafana_dashboard" "managed" {
  for_each    = data.jsonnet_file.dashboard
  config_json = each.value.rendered
  overwrite   = true
}
# variables.tf
variable "grafana_url" {
  type        = string
  description = "Base URL of the Grafana instance."
}

variable "grafana_service_account_token" {
  type        = string
  description = "Service account token with dashboard write scope."
  sensitive   = true
}

The overwrite = true is essential. It tells Grafana to replace a dashboard with the same UID even if someone edited it in the UI — Terraform reasserts the source of truth on every apply, which is exactly the drift cure you want.

The CI Pipeline

Tie it together so every change goes through review. This GitHub Actions workflow renders, plans on PRs, and applies on merge.

# .github/workflows/grafana.yml
name: grafana-iac
on:
  pull_request:
    paths: ['grafana-iac/**']
  push:
    branches: [main]
    paths: ['grafana-iac/**']

jobs:
  terraform:
    runs-on: ubuntu-24.04
    defaults:
      run:
        working-directory: grafana-iac
    env:
      TF_VAR_grafana_url: ${{ secrets.GRAFANA_URL }}
      TF_VAR_grafana_service_account_token: ${{ secrets.GRAFANA_SA_TOKEN }}
    steps:
      - uses: actions/checkout@v4

      - name: Install jsonnet toolchain
        run: |
          go install github.com/google/go-jsonnet/cmd/[email protected]
          go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/[email protected]
          echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH"

      - name: Vendor Grafonnet
        run: jb install

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.10.5

      - run: terraform init
      - run: terraform plan -no-color
        if: github.event_name == 'pull_request'
      - run: terraform apply -auto-approve
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'

A pull request now shows a Terraform plan diff of the actual dashboard JSON. A reviewer can see that a rate window changed from 5m to 1m before it ships — the exact mistake that bit me on staging.

Common Pitfalls

  • Committing raw exported JSON. It carries volatile id/version fields that produce noisy diffs and a constant Terraform plan churn. Render from Jsonnet instead.
  • Omitting withUid. Without a stable UID, Terraform creates a new dashboard on every apply instead of updating in place. Always set an explicit UID.
  • Skipping overwrite = true. Without it, a UI edit blocks the next Terraform apply and drift wins.
  • Editing managed dashboards in the UI. Add a managed-by-terraform tag and a panel note so nobody is surprised when their change vanishes on the next apply.
  • Unpinned Grafonnet. jb install ...@main without a lockfile means a dashboard can change shape on a CI runner. Commit jsonnetfile.lock.json.
  • Storing the SA token in .tfvars. Use CI secrets and a remote backend with state encryption.

Troubleshooting

Symptom: terraform apply creates a duplicate dashboard each run. Cause: The dashboard has no stable UID, so Grafana treats every apply as new. Fix: Add g.dashboard.withUid('...') to the Jsonnet and import the existing dashboard once with terraform import.

Symptom: Plan shows a constant diff on config_json even with no changes. Cause: Volatile fields (version, iteration) or non-canonical key ordering. Fix: Render through Jsonnet, which emits canonical output, and never set version manually.

Symptom: jsonnet_file data source fails with “RUNTIME ERROR: couldn’t open import”. Cause: Grafonnet wasn’t vendored, or jsonnet_path doesn’t point at vendor. Fix: Run jb install before terraform plan and set jsonnet_path = "${path.module}/vendor" in the provider block.

Symptom: Apply fails with HTTP 403 from Grafana. Cause: The service account token lacks dashboard write permission. Fix: Grant the service account the Editor role or a custom role with dashboards:write, then regenerate the token.

Wrapping Up

Grafana dashboards as code with Terraform turns an invisible, drift-prone artifact into a reviewed, reproducible one: Grafonnet for typed reuse, Jsonnet for canonical JSON, Terraform for the apply, and Git for history. The first dashboard takes an afternoon; every dashboard after that is a few lines because the panel library does the work. Next, fold your alert rules into the same pipeline so alerting and visualization ship together.

Advertisement