background-shape
Self Hosting n8n for Engineering Teams in 2024
September 4, 2024 · 7 min read · by Muhammad Amal programming

TL;DR — Self-hosted n8n 1.58 in 2024 is a real platform commitment, not a docker-compose afternoon. Run it in queue mode, on Postgres 16, with Redis 7, behind a real reverse proxy, with proper secrets and backups. Budget for the operational load — it’s small but nonzero.

I’ve stood up self-hosted n8n on three different teams in the last eighteen months. The first time we treated it as a side project and learned the hard way. The second and third went smoothly because we treated it as a Tier-2 internal platform from day one. This post is the playbook I wish we’d started with.

What I’m not going to do is rehash the docker-compose quickstart from the n8n docs. That’s fine for evaluation. It’s not what you want running the company’s marketing automation by month six. The interesting questions are around mode, persistence, scaling, secrets, and upgrades. That’s what we’ll cover.

If you read the pro-code view of low-code in the enterprise post and decided to engage, this is the operational starting point.

Why self-host at all

The cloud-hosted n8n offering is fine. Use it if you don’t have a platform team. But there are three reasons engineering teams I’ve worked with have chosen self-hosting in 2024.

First, network egress. When your workflows need to call internal APIs that don’t have public endpoints, you either build a public proxy (yuck) or put n8n inside the VPC. Self-hosting is the simpler path.

Second, data residency. If you’re a European or Singapore-regulated business, you probably already have a position on where SaaS data lives. Self-hosted n8n keeps the workflow state and credentials in your perimeter.

Third, cost at scale. Once you cross a few hundred workflows or a few million executions a month, self-hosting pencils out. The break-even is roughly where one part-time SRE’s time is cheaper than the cloud bill, which happens earlier than you’d think.

If none of those three apply, stay on cloud. The post-cost-of-ownership is real.

Architecture, the September 2024 version

Here’s the topology that works at any scale beyond “team of ten.” This is what n8n 1.58 expects when you run it in queue mode.

                  ┌──────────────────────────────────┐
                  │    Reverse Proxy (Caddy/Nginx)   │
                  └──────────────┬───────────────────┘
                                 │ TLS
            ┌────────────────────┴────────────────────┐
            │                                         │
   ┌────────▼────────┐                      ┌─────────▼────────┐
   │  n8n main (UI)  │                      │  n8n webhook svc │
   │  REST + editor  │                      │  /webhook/*      │
   └────────┬────────┘                      └─────────┬────────┘
            │                                         │
            └────────────────┬────────────────────────┘
                ┌────────────▼────────────┐
                │   Redis 7 (queue)       │
                └────────────┬────────────┘
                ┌────────────▼────────────┐
                │  n8n worker x N (Bull)  │
                └────────────┬────────────┘
                ┌────────────▼────────────┐
                │   Postgres 16 (state)   │
                └─────────────────────────┘

The split matters. The main instance serves the editor and REST API. The webhook service handles inbound triggers. Workers consume the Bull queue from Redis and execute. State lives in Postgres. All four components can be scaled and restarted independently.

A minimal compose file that actually reflects this, for development. In production each piece is a separate deployment on Kubernetes or Nomad.

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: n8n
      POSTGRES_USER: n8n
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine

  n8n-main:
    image: n8nio/n8n:1.58.2
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
      EXECUTIONS_MODE: queue
      QUEUE_BULL_REDIS_HOST: redis
      N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
      N8N_HOST: n8n.internal.example.com
      N8N_PROTOCOL: https
      WEBHOOK_URL: https://n8n.internal.example.com/
      N8N_DISABLE_PRODUCTION_MAIN_PROCESS: "true"
    depends_on: [postgres, redis]

  n8n-worker:
    image: n8nio/n8n:1.58.2
    command: worker
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: postgres
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
      EXECUTIONS_MODE: queue
      QUEUE_BULL_REDIS_HOST: redis
      N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
    depends_on: [postgres, redis]
    deploy:
      replicas: 4

volumes:
  pgdata:

Two settings to call out. EXECUTIONS_MODE: queue is the whole point — without it you’re in regular mode and the main process is doing executions, which falls over as soon as anyone schedules anything heavy. And N8N_DISABLE_PRODUCTION_MAIN_PROCESS: "true" ensures the main process doesn’t pick up jobs. Workers do that.

The encryption key is a sacred thing

N8N_ENCRYPTION_KEY encrypts all credentials at rest in Postgres. Lose it and every credential in the system is gone. Rotate it without migrating and same outcome. Treat it like a database master key — store it in Vault, AWS Secrets Manager, or sealed-secrets, and document the rotation procedure before you onboard your first credential.

I’ve seen one team treat this as a docker-compose env var checked into git. Don’t.

Sizing, what you actually need

For a team running a few hundred workflows with mostly SaaS API calls, this works:

  • 1 main instance, 1 vCPU, 1 GB RAM
  • 1 webhook instance, 1 vCPU, 1 GB RAM (or skip and route through main for small teams)
  • 4 workers, 1 vCPU and 1 GB RAM each
  • Postgres 16, db.t4g.medium (AWS) or equivalent, 50 GB gp3
  • Redis 7, single-node, 1 GB

That’ll handle roughly a million executions a month with headroom. Beyond that, add workers linearly. Postgres becomes the bottleneck at around 5 million executions monthly — the execution log table grows fast.

Set a retention policy. This is non-optional.

# In the n8n environment
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336   # hours, so 14 days
EXECUTIONS_DATA_PRUNE_MAX_COUNT=50000

Without pruning, Postgres will eat its own disk by quarter two. I’ve seen it happen.

Observability

n8n 1.58 ships Prometheus metrics at /metrics when you set N8N_METRICS=true. Scrape it. The metrics that actually matter:

  • n8n_active_workflow_count — alert if this drops unexpectedly
  • n8n_workflow_execution_status_total{status="error"} — error rate
  • n8n_workflow_execution_duration_seconds — p95 per workflow
  • Queue depth on Bull (scrape Redis separately)

Log to stdout in JSON (N8N_LOG_OUTPUT=console, N8N_LOG_LEVEL=info) and ship to wherever your other services ship. The default text logs are fine for humans but parsing them in production is a waste.

Authentication and access control

n8n has user management, role-based access (in the paid plans), and SSO via SAML or OIDC. For a real internal deployment you want SSO connected to your IdP. I’ll cover the identity federation piece in more depth in a couple of weeks, but the short answer is, hook n8n to Keycloak 25 or Auth0, do not run local users.

If you’re on the free community edition without SSO, gate access at the reverse proxy level with an auth-proxy like oauth2-proxy. It’s a workaround but it works.

Upgrades

n8n ships fast. There’s typically a minor release every two weeks. The upgrade itself is straightforward — pull the new image, run migrations, restart. The risk is workflow regressions; nodes occasionally get behaviour changes.

The discipline I follow:

  1. Read the release notes before every minor bump.
  2. Pin the version in your deployment manifests. Never :latest.
  3. Test in a staging instance with a representative subset of workflows. Yes, this is annoying. Yes, you need it.
  4. Roll workers first, then main. Workers are idempotent. Main holds the editor and migrations.
  5. Take a Postgres snapshot immediately before any migration.

Once a quarter, do a full restore-from-backup drill. The first time you actually need to restore should not be the first time you’ve tried.

Common pitfalls

Things I’ve watched teams trip on.

  • Running in regular mode in production. Works fine until a long-running workflow starves the editor. Switch to queue mode before the second workflow ships.
  • No Postgres backups. People assume “it’s just workflow state, who cares.” Then a citizen dev deletes their workflow and there’s no way back. Backup nightly, retain thirty days, test restores.
  • One worker pool for everything. A scraping workflow at 5 RPS will block your finance reconciliation. Use the queue’s worker-tag feature to dedicate workers to high-priority queues. The --concurrency flag on workers matters here.
  • Letting the executions table grow forever. Pruning is off by default. Turn it on day one.
  • Forgetting the timezone. GENERIC_TIMEZONE=Asia/Jakarta (or wherever you actually are). Cron triggers will fire at UTC otherwise and the marketing team will be confused.
  • No external file storage. By default, binary data lives in the database or on the worker’s local disk. Configure N8N_DEFAULT_BINARY_DATA_MODE=filesystem with a shared volume, or better, S3 mode with N8N_AVAILABLE_BINARY_DATA_MODES=filesystem,s3.

What’s next

Self-hosting n8n is straightforward if you treat it as a proper internal platform — Postgres-backed, queue-moded, observable, upgradeable, and SSO’d. The cost of doing it well is roughly one engineer-week to stand up and a couple of hours a month to maintain. The cost of doing it badly is a credential leak and a CFO meeting.

Next post in this series goes into how to safely expose your internal APIs to the citizen devs running on this platform. That’s where the interesting access-control work lives, because the platform being on your infra doesn’t automatically mean the workflows running on it are safe to call your services. See you Monday.