Self Hosting n8n for Engineering Teams in 2024
TL;DR — Self-hosted n8n 1.58 in 2024 is a real platform commitment, not a docker-compose afternoon. Run it in queue mode, on Postgres 16, with Redis 7, behind a real reverse proxy, with proper secrets and backups. Budget for the operational load — it’s small but nonzero.
I’ve stood up self-hosted n8n on three different teams in the last eighteen months. The first time we treated it as a side project and learned the hard way. The second and third went smoothly because we treated it as a Tier-2 internal platform from day one. This post is the playbook I wish we’d started with.
What I’m not going to do is rehash the docker-compose quickstart from the n8n docs. That’s fine for evaluation. It’s not what you want running the company’s marketing automation by month six. The interesting questions are around mode, persistence, scaling, secrets, and upgrades. That’s what we’ll cover.
If you read the pro-code view of low-code in the enterprise post and decided to engage, this is the operational starting point.
Why self-host at all
The cloud-hosted n8n offering is fine. Use it if you don’t have a platform team. But there are three reasons engineering teams I’ve worked with have chosen self-hosting in 2024.
First, network egress. When your workflows need to call internal APIs that don’t have public endpoints, you either build a public proxy (yuck) or put n8n inside the VPC. Self-hosting is the simpler path.
Second, data residency. If you’re a European or Singapore-regulated business, you probably already have a position on where SaaS data lives. Self-hosted n8n keeps the workflow state and credentials in your perimeter.
Third, cost at scale. Once you cross a few hundred workflows or a few million executions a month, self-hosting pencils out. The break-even is roughly where one part-time SRE’s time is cheaper than the cloud bill, which happens earlier than you’d think.
If none of those three apply, stay on cloud. The post-cost-of-ownership is real.
Architecture, the September 2024 version
Here’s the topology that works at any scale beyond “team of ten.” This is what n8n 1.58 expects when you run it in queue mode.
┌──────────────────────────────────┐
│ Reverse Proxy (Caddy/Nginx) │
└──────────────┬───────────────────┘
│ TLS
┌────────────────────┴────────────────────┐
│ │
┌────────▼────────┐ ┌─────────▼────────┐
│ n8n main (UI) │ │ n8n webhook svc │
│ REST + editor │ │ /webhook/* │
└────────┬────────┘ └─────────┬────────┘
│ │
└────────────────┬────────────────────────┘
│
┌────────────▼────────────┐
│ Redis 7 (queue) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ n8n worker x N (Bull) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Postgres 16 (state) │
└─────────────────────────┘
The split matters. The main instance serves the editor and REST API. The webhook service handles inbound triggers. Workers consume the Bull queue from Redis and execute. State lives in Postgres. All four components can be scaled and restarted independently.
A minimal compose file that actually reflects this, for development. In production each piece is a separate deployment on Kubernetes or Nomad.
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: n8n
POSTGRES_USER: n8n
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
n8n-main:
image: n8nio/n8n:1.58.2
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
EXECUTIONS_MODE: queue
QUEUE_BULL_REDIS_HOST: redis
N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
N8N_HOST: n8n.internal.example.com
N8N_PROTOCOL: https
WEBHOOK_URL: https://n8n.internal.example.com/
N8N_DISABLE_PRODUCTION_MAIN_PROCESS: "true"
depends_on: [postgres, redis]
n8n-worker:
image: n8nio/n8n:1.58.2
command: worker
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
EXECUTIONS_MODE: queue
QUEUE_BULL_REDIS_HOST: redis
N8N_ENCRYPTION_KEY: ${N8N_ENCRYPTION_KEY}
depends_on: [postgres, redis]
deploy:
replicas: 4
volumes:
pgdata:
Two settings to call out. EXECUTIONS_MODE: queue is the whole point — without it you’re in regular mode and the main process is doing executions, which falls over as soon as anyone schedules anything heavy. And N8N_DISABLE_PRODUCTION_MAIN_PROCESS: "true" ensures the main process doesn’t pick up jobs. Workers do that.
The encryption key is a sacred thing
N8N_ENCRYPTION_KEY encrypts all credentials at rest in Postgres. Lose it and every credential in the system is gone. Rotate it without migrating and same outcome. Treat it like a database master key — store it in Vault, AWS Secrets Manager, or sealed-secrets, and document the rotation procedure before you onboard your first credential.
I’ve seen one team treat this as a docker-compose env var checked into git. Don’t.
Sizing, what you actually need
For a team running a few hundred workflows with mostly SaaS API calls, this works:
- 1 main instance, 1 vCPU, 1 GB RAM
- 1 webhook instance, 1 vCPU, 1 GB RAM (or skip and route through main for small teams)
- 4 workers, 1 vCPU and 1 GB RAM each
- Postgres 16, db.t4g.medium (AWS) or equivalent, 50 GB gp3
- Redis 7, single-node, 1 GB
That’ll handle roughly a million executions a month with headroom. Beyond that, add workers linearly. Postgres becomes the bottleneck at around 5 million executions monthly — the execution log table grows fast.
Set a retention policy. This is non-optional.
# In the n8n environment
EXECUTIONS_DATA_PRUNE=true
EXECUTIONS_DATA_MAX_AGE=336 # hours, so 14 days
EXECUTIONS_DATA_PRUNE_MAX_COUNT=50000
Without pruning, Postgres will eat its own disk by quarter two. I’ve seen it happen.
Observability
n8n 1.58 ships Prometheus metrics at /metrics when you set N8N_METRICS=true. Scrape it. The metrics that actually matter:
n8n_active_workflow_count— alert if this drops unexpectedlyn8n_workflow_execution_status_total{status="error"}— error raten8n_workflow_execution_duration_seconds— p95 per workflow- Queue depth on Bull (scrape Redis separately)
Log to stdout in JSON (N8N_LOG_OUTPUT=console, N8N_LOG_LEVEL=info) and ship to wherever your other services ship. The default text logs are fine for humans but parsing them in production is a waste.
Authentication and access control
n8n has user management, role-based access (in the paid plans), and SSO via SAML or OIDC. For a real internal deployment you want SSO connected to your IdP. I’ll cover the identity federation piece in more depth in a couple of weeks, but the short answer is, hook n8n to Keycloak 25 or Auth0, do not run local users.
If you’re on the free community edition without SSO, gate access at the reverse proxy level with an auth-proxy like oauth2-proxy. It’s a workaround but it works.
Upgrades
n8n ships fast. There’s typically a minor release every two weeks. The upgrade itself is straightforward — pull the new image, run migrations, restart. The risk is workflow regressions; nodes occasionally get behaviour changes.
The discipline I follow:
- Read the release notes before every minor bump.
- Pin the version in your deployment manifests. Never
:latest. - Test in a staging instance with a representative subset of workflows. Yes, this is annoying. Yes, you need it.
- Roll workers first, then main. Workers are idempotent. Main holds the editor and migrations.
- Take a Postgres snapshot immediately before any migration.
Once a quarter, do a full restore-from-backup drill. The first time you actually need to restore should not be the first time you’ve tried.
Common pitfalls
Things I’ve watched teams trip on.
- Running in regular mode in production. Works fine until a long-running workflow starves the editor. Switch to queue mode before the second workflow ships.
- No Postgres backups. People assume “it’s just workflow state, who cares.” Then a citizen dev deletes their workflow and there’s no way back. Backup nightly, retain thirty days, test restores.
- One worker pool for everything. A scraping workflow at 5 RPS will block your finance reconciliation. Use the queue’s worker-tag feature to dedicate workers to high-priority queues. The
--concurrencyflag on workers matters here. - Letting the executions table grow forever. Pruning is off by default. Turn it on day one.
- Forgetting the timezone.
GENERIC_TIMEZONE=Asia/Jakarta(or wherever you actually are). Cron triggers will fire at UTC otherwise and the marketing team will be confused. - No external file storage. By default, binary data lives in the database or on the worker’s local disk. Configure
N8N_DEFAULT_BINARY_DATA_MODE=filesystemwith a shared volume, or better, S3 mode withN8N_AVAILABLE_BINARY_DATA_MODES=filesystem,s3.
What’s next
Self-hosting n8n is straightforward if you treat it as a proper internal platform — Postgres-backed, queue-moded, observable, upgradeable, and SSO’d. The cost of doing it well is roughly one engineer-week to stand up and a couple of hours a month to maintain. The cost of doing it badly is a credential leak and a CFO meeting.
Next post in this series goes into how to safely expose your internal APIs to the citizen devs running on this platform. That’s where the interesting access-control work lives, because the platform being on your infra doesn’t automatically mean the workflows running on it are safe to call your services. See you Monday.