Self Hosted n8n on Kubernetes, A Production Setup

Self Hosted n8n on Kubernetes, A Production Setup

August 20, 2025 · 10 min read · by Muhammad Amal programming

TL;DR — Run n8n 1.78 on Kubernetes as three separate Deployments: main, webhook, worker. Use HPA on workers, PodDisruptionBudget on all, and an Ingress with sticky sessions for the editor. Stateful dependencies stay on managed services or operators.

The community Helm charts for n8n have gotten better, but none of them are quite what I’d ship to production without modification. Most assume a single Deployment running all three process types, which works for development and breaks the moment you have any real load. The split into main, webhook, and worker is essential, and the Helm chart needs to reflect that.

This article is the production Kubernetes setup. We’ll go through the namespace and RBAC, three Deployments with the right env vars, an HPA on workers, an Ingress with sticky sessions for the editor, a PodDisruptionBudget, and the persistence story for Postgres and Redis. The architecture is the one from the advanced n8n architecture article and depends on the queue mode setup from the Redis at scale article.

Opinions up front. Don’t run Postgres in-cluster with a single PVC. Use a managed service or a proper operator like CloudNativePG. Don’t share one PVC across multiple n8n pods, the binary files n8n writes are pod-local. And don’t autoscale the main process, it’s stateful in 1.78.

1. Namespace, ServiceAccount, and Network Policy

Start with a dedicated namespace and a service account with minimum RBAC. n8n doesn’t need any cluster-wide permissions, just its own namespace.

apiVersion: v1
kind: Namespace
metadata:
  name: n8n
  labels:
    pod-security.kubernetes.io/enforce: restricted
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: n8n
  namespace: n8n
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: n8n-default-deny
  namespace: n8n
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: n8n-allow-internal
  namespace: n8n
spec:
  podSelector:
    matchLabels:
      app: n8n
  policyTypes: [Ingress, Egress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
        - podSelector:
            matchLabels:
              app: n8n
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
        - podSelector:
            matchLabels:
              app: redis
    - {}  # allow internet egress for HTTP nodes

The Pod Security Standard restricted enforces non-root, read-only root filesystem, and no privilege escalation. n8n 1.78 runs cleanly under restricted as of August 2025. Verify with kubectl get pod -n n8n -o yaml | grep securityContext.

The NetworkPolicy default-denies all traffic, then re-allows traffic from the ingress controller and within the namespace, plus egress to Postgres and Redis. The empty to: -{} egress rule allows internet access for external HTTP node calls. Restrict it further if you have a clear allowlist of upstream APIs.

2. Secrets and Config

Sensitive values go in Secrets. Non-sensitive config in ConfigMaps. The encryption key is the highest-stakes secret, treat it like a root credential.

apiVersion: v1
kind: Secret
metadata:
  name: n8n-encryption-key
  namespace: n8n
type: Opaque
stringData:
  N8N_ENCRYPTION_KEY: "replace-with-output-of-openssl-rand-hex-32"
---
apiVersion: v1
kind: Secret
metadata:
  name: n8n-db
  namespace: n8n
type: Opaque
stringData:
  DB_POSTGRESDB_PASSWORD: "..."
  QUEUE_BULL_REDIS_PASSWORD: "..."
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: n8n-config
  namespace: n8n
data:
  EXECUTIONS_MODE: "queue"
  N8N_RUNNERS_ENABLED: "true"
  N8N_METRICS: "true"
  N8N_METRICS_PREFIX: "n8n_"
  DB_TYPE: "postgresdb"
  DB_POSTGRESDB_HOST: "postgres.data.svc.cluster.local"
  DB_POSTGRESDB_PORT: "5432"
  DB_POSTGRESDB_DATABASE: "n8n"
  DB_POSTGRESDB_USER: "n8n"
  QUEUE_BULL_REDIS_HOST: "redis.data.svc.cluster.local"
  QUEUE_BULL_REDIS_PORT: "6379"
  EXECUTIONS_DATA_PRUNE: "true"
  EXECUTIONS_DATA_MAX_AGE: "336"
  EXECUTIONS_DATA_SAVE_ON_SUCCESS: "none"
  EXECUTIONS_DATA_SAVE_ON_ERROR: "all"
  N8N_PROTOCOL: "https"
  N8N_HOST: "n8n.acme.internal"
  WEBHOOK_URL: "https://n8n.acme.internal/"
  N8N_EDITOR_BASE_URL: "https://n8n.acme.internal/"

The encryption key must be identical across all three deployments. Mount the same Secret to each.

3. The Main Deployment

The main process is the brains. It serves the editor, the API, and runs cron triggers. Single replica, no autoscaling.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n8n-main
  namespace: n8n
spec:
  replicas: 1
  strategy: { type: Recreate }
  selector:
    matchLabels: { app: n8n, component: main }
  template:
    metadata:
      labels: { app: n8n, component: main }
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "5678"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: n8n
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile: { type: RuntimeDefault }
      containers:
        - name: n8n
          image: n8nio/n8n:1.78.0
          command: ["n8n", "start"]
          ports:
            - { containerPort: 5678, name: http }
          envFrom:
            - configMapRef: { name: n8n-config }
            - secretRef: { name: n8n-encryption-key }
            - secretRef: { name: n8n-db }
          resources:
            requests: { cpu: "500m", memory: "1Gi" }
            limits: { cpu: "2", memory: "2Gi" }
          livenessProbe:
            httpGet: { path: /healthz, port: 5678 }
            initialDelaySeconds: 60
            periodSeconds: 30
          readinessProbe:
            httpGet: { path: /healthz/readiness, port: 5678 }
            initialDelaySeconds: 10
            periodSeconds: 10
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities: { drop: [ALL] }
          volumeMounts:
            - { name: n8n-data, mountPath: /home/node/.n8n }
            - { name: tmp, mountPath: /tmp }
      volumes:
        - name: n8n-data
          persistentVolumeClaim: { claimName: n8n-main-data }
        - name: tmp
          emptyDir: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: n8n-main-data
  namespace: n8n
spec:
  accessModes: [ReadWriteOnce]
  resources: { requests: { storage: 10Gi } }
  storageClassName: gp3

The Recreate strategy is correct for a single-replica stateful main. Don’t use RollingUpdate because two main processes briefly running together fight over active triggers. The PVC stores .n8n which n8n uses for some local state, mostly safe to lose but keep it for the cron schedules.

readOnlyRootFilesystem: true is the right setting, with an emptyDir mounted at /tmp for transient writes. n8n 1.78 respects this. Earlier versions wrote to weird locations and broke.

4. The Webhook Deployment

Webhooks are stateless. Multiple replicas, HPA optional, behind a service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n8n-webhook
  namespace: n8n
spec:
  replicas: 2
  selector:
    matchLabels: { app: n8n, component: webhook }
  template:
    metadata:
      labels: { app: n8n, component: webhook }
    spec:
      serviceAccountName: n8n
      containers:
        - name: n8n
          image: n8nio/n8n:1.78.0
          command: ["n8n", "webhook"]
          ports:
            - { containerPort: 5678, name: http }
          envFrom:
            - configMapRef: { name: n8n-config }
            - secretRef: { name: n8n-encryption-key }
            - secretRef: { name: n8n-db }
          env:
            - name: N8N_DISABLE_PRODUCTION_MAIN_PROCESS
              value: "true"
          resources:
            requests: { cpu: "200m", memory: "512Mi" }
            limits: { cpu: "1", memory: "1Gi" }
          readinessProbe:
            httpGet: { path: /healthz, port: 5678 }
            initialDelaySeconds: 10
            periodSeconds: 5

Two replicas as a floor for redundancy. Scale up if webhook latency rises.

5. The Worker Deployment with HPA

Workers are where the real work happens. Multiple replicas, HPA on CPU and a custom queue-depth metric.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n8n-worker
  namespace: n8n
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxUnavailable: 25%, maxSurge: 25% }
  selector:
    matchLabels: { app: n8n, component: worker }
  template:
    metadata:
      labels: { app: n8n, component: worker }
    spec:
      serviceAccountName: n8n
      terminationGracePeriodSeconds: 120
      containers:
        - name: n8n
          image: n8nio/n8n:1.78.0
          command: ["n8n", "worker"]
          envFrom:
            - configMapRef: { name: n8n-config }
            - secretRef: { name: n8n-encryption-key }
            - secretRef: { name: n8n-db }
          env:
            - name: N8N_CONCURRENCY_PRODUCTION_LIMIT
              value: "10"
          resources:
            requests: { cpu: "1", memory: "1Gi" }
            limits: { cpu: "2", memory: "2Gi" }
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "n8n worker --shutdown && sleep 30"]
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: n8n-worker
  namespace: n8n
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: n8n-worker
  minReplicas: 4
  maxReplicas: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }
    - type: External
      external:
        metric:
          name: n8n_queue_jobs_waiting
          selector: { matchLabels: { queue: default } }
        target: { type: AverageValue, averageValue: "20" }
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

The CPU target is conservative at 70 percent. The external metric pulls n8n_queue_jobs_waiting from Prometheus via the prometheus-adapter, so the HPA reacts to queue backlog directly.

terminationGracePeriodSeconds: 120 and the preStop lifecycle hook are essential. Workers in the middle of an execution need time to finish gracefully. The n8n worker --shutdown command in 1.78 signals the worker to stop accepting new jobs and finish in-flight ones. Without it, a rolling update drops jobs.

6. Service, Ingress, and PDB

The Service routes to the right component per path. The Ingress terminates TLS and applies sticky sessions for the editor (the WebSocket connection benefits).

apiVersion: v1
kind: Service
metadata:
  name: n8n-main
  namespace: n8n
spec:
  selector: { app: n8n, component: main }
  ports: [{ port: 80, targetPort: 5678 }]
---
apiVersion: v1
kind: Service
metadata:
  name: n8n-webhook
  namespace: n8n
spec:
  selector: { app: n8n, component: webhook }
  ports: [{ port: 80, targetPort: 5678 }]
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: n8n
  namespace: n8n
  annotations:
    nginx.ingress.kubernetes.io/affinity: cookie
    nginx.ingress.kubernetes.io/session-cookie-name: n8n-session
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  ingressClassName: nginx
  tls:
    - hosts: [n8n.acme.internal]
      secretName: n8n-tls
  rules:
    - host: n8n.acme.internal
      http:
        paths:
          - path: /webhook
            pathType: Prefix
            backend: { service: { name: n8n-webhook, port: { number: 80 } } }
          - path: /webhook-test
            pathType: Prefix
            backend: { service: { name: n8n-webhook, port: { number: 80 } } }
          - path: /
            pathType: Prefix
            backend: { service: { name: n8n-main, port: { number: 80 } } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: n8n-worker
  namespace: n8n
spec:
  minAvailable: 2
  selector: { matchLabels: { app: n8n, component: worker } }
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: n8n-webhook
  namespace: n8n
spec:
  minAvailable: 1
  selector: { matchLabels: { app: n8n, component: webhook } }

The sticky cookie keeps an editor session on the same main pod (relevant if you ever scale main horizontally, otherwise it’s a no-op with replicas=1). The proxy-read-timeout extension supports long-running WebSocket connections for the workflow execution view.

The PDB on workers keeps at least two running during voluntary disruptions (node drains, cluster upgrades). On webhook, one. Main has no PDB because it’s a singleton.

7. Persistence, Postgres and Redis

Don’t run these in-cluster on a single PVC. The right options:

+-----------------+--------------------------------------------+
| dependency      | recommended                                |
+-----------------+--------------------------------------------+
| Postgres 17     | Managed service (RDS, Cloud SQL) or         |
|                 | CloudNativePG operator with replication     |
| Redis 7.4       | Managed service or Redis operator with      |
|                 | Sentinel (see Redis-at-scale article)       |
+-----------------+--------------------------------------------+

For self-managed Postgres, CloudNativePG is the operator I trust. It handles backups via barman-cloud, point-in-time recovery, rolling minor-version upgrades, and HA via streaming replication. Minimum config:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: n8n-pg
  namespace: data
spec:
  instances: 3
  imageName: ghcr.io/cloudnative-pg/postgresql:17.0
  primaryUpdateStrategy: unsupervised
  storage:
    size: 100Gi
    storageClass: gp3-iops
  bootstrap:
    initdb:
      database: n8n
      owner: n8n
      secret: { name: n8n-pg-app }
  backup:
    barmanObjectStore:
      destinationPath: s3://acme-pg-backups/n8n
      s3Credentials:
        accessKeyId: { name: pg-backups, key: aws_access_key_id }
        secretAccessKey: { name: pg-backups, key: aws_secret_access_key }
      wal: { compression: zstd }
    retentionPolicy: "30d"

Three instances with one primary and two replicas. WAL archive to S3 for PITR. The n8n connection points to the cluster’s read-write Service. The official CloudNativePG docs cover the operational details.

Common Pitfalls

Four mistakes.

Running main as a Deployment with replicas=2. Two main processes fight over active triggers and you get duplicate cron executions. Always replicas=1 with Recreate strategy.

Forgetting the terminationGracePeriodSeconds on workers. Pods get SIGKILLed after the default 30 seconds. In-flight executions get marked as stalled. Set it to at least the p99 execution duration.

Hosting the encryption key in a ConfigMap. Secrets, not ConfigMap. Sealed Secrets or External Secrets Operator if you store config in git. Never plaintext in the repo.

Letting the main PVC die with the pod. A Recreate strategy plus volumeClaimTemplates on a StatefulSet would survive recreates. But if you use a plain Deployment with a PVC, the PVC is preserved across pod restarts as long as you don’t delete it. Don’t use kubectl delete pvc casually.

Troubleshooting

Three failure modes.

Workers crash-loop with “encryption key not set”. The secret isn’t mounted. kubectl describe pod should show the env var as set. If not, check the secretRef name matches.

Editor loads but workflows don’t run. Main is enqueuing to a queue name that workers aren’t reading. Check QUEUE_NAME env var on both (default is default, blank is also default, but a typo makes them different).

HPA doesn’t scale up under load. Prometheus-adapter not configured for the queue metric. Check kubectl get apiservice v1beta1.external.metrics.k8s.io is Available. If not, install or repair prometheus-adapter.

Wrapping Up

n8n on Kubernetes in production is three Deployments, not one. Main is singleton stateful. Webhook and worker are stateless and autoscaled. Postgres and Redis are managed or operator-managed, not bare PVCs. Sticky sessions on the editor ingress, PDBs on workers, and proper preStop hooks so rolling updates don’t drop in-flight executions.

The final article in this series gets into observability: how to actually see what’s happening inside this stack with metrics, logs, and traces. The Kubernetes setup gives you the platform, observability gives you the eyes.

1. Namespace, ServiceAccount, and Network Policy

2. Secrets and Config

3. The Main Deployment

4. The Webhook Deployment

5. The Worker Deployment with HPA

6. Service, Ingress, and PDB

7. Persistence, Postgres and Redis

Common Pitfalls

Troubleshooting

Wrapping Up

Related posts

Observability for n8n in 2025, Metrics, Logs, and Traces

Error Handling and Retries for Production n8n Workflows

Managing Secrets and Credentials in n8n for Enterprise

n8n Queue Mode with Redis at Scale, A Production Walkthrough

Connecting Jira Cloud to Internal Platforms with n8n

Orchestrating Complex Enterprise Data Syncs with n8n

Writing Custom n8n Nodes in TypeScript, A Step by Step Tutorial

Advanced n8n Architecture in 2025, Queue Mode and Worker Scaling

Let’s Start a Project